ZeroOneCreative/amara-spatial-10k
收藏Hugging Face2026-04-17 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/ZeroOneCreative/amara-spatial-10k
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- text-to-3d
- image-to-3d
size_categories:
- 10K<n<100K
tags:
- 3d
- mesh
- glb
- synthetic
- spatial
- pbr
- webdataset
- embodied-ai
pretty_name: AmaraSpatial-10K
configs:
- config_name: default
data_files:
- split: train
path: "metadata/*.parquet"
---
# AmaraSpatial-10K
### A Semantically Anchored, Metric-Scale 3D Dataset for Embodied AI and Spatial Computing

**10,071 AI-generated 3D meshes across 65 categories** — from basilisks to bassoons, cottages to cosmic stations — curated by **Zero One Creative** to close the *spatial alignment gap* that makes most generative 3D repositories unusable for zero-shot deployment in game engines, robotics simulators, and AR/VR pipelines.
Every asset is simultaneously **metric-scaled**, **semantically anchored**, **PBR-ready**, and **richly described** — four properties that, to our knowledge, do not co-occur in any other public 3D dataset at this scale.
---
## Why this dataset exists
Recent image-to-3D models can produce plausible meshes, but their outputs are spatially *ungrounded*: a generated chair may be 40 m tall, oriented sideways, with its pivot point floating at the centroid. Large repositories inherit and compound this problem — ShapeNet has no PBR, Objaverse has severe quality variance and arbitrary scale, GSO is metric-accurate but only ~1K assets.
The next evolution of 3D datasets is not pure volume, but **spatial and semantic alignment**. AmaraSpatial-10K is curated to be that.
### The four properties, all at once
- 🟡 **Real-world metric scaling.** Assets are scaled to true physical dimensions in metres and validated by a novel **Scale Plausibility Score (SPS)** using an independent LLM-as-judge.
- 🟡 **Semantic origin anchoring.** Origins are placed by functional context — bottom-centre for ground-resting items (chairs, tables), centre for suspended objects (chandeliers, drones), top-centre for ceiling-mounted items.
- 🟡 **Production-ready PBR & physics.** Main meshes are decimated to ~50K triangles with separated Normal/Roughness maps (no baked lighting), and ship with a paired convex collision hull (<500 triangles).
- 🟡 **Rich multi-modal metadata.** Every asset includes multi-sentence descriptions, a 2D seed image, and five camera renders, yielding ~18× the descriptive concept density of Objaverse tags.



---
## Key results at a glance
Averages across 9 evaluated categories (5,247 assets in AmaraSpatial-10K, 2,856 matched in Objaverse):
| Metric | AmaraSpatial-10K | Objaverse (matched) |
|---|---|---|
| Mean bounding-box height across 9 categories | **3.89 m** | 1,723 m |
| Intra-category scale **CV** (9-category mean) ↓ | **3.40** | 9.92 |
| Seating assets in plausible range [0.6, 1.1] m ↑ | **40.7 %** | 7.7 % |
| Mean **SPS** ↑ | **0.68** | — |
| Assets within plausible size range (aggregate) ↑ | **29.5 %** | — |
| Anchor within 1 cm of semantic target ↑ | **79.7 %** | 4.2 % |
| Anchors outside object bounding box ↓ | **5.2 %** | 35.2 % |
| CLIP Text ↔ 3D coherence ↑ | **0.238** | 0.203 |
| LLM Concept Density (0–5) ↑ | **2.62** | 0.14 |
| UV-mapped ↑ | **100 %** | 94 % |
Where SPS and CV stand for:
- **Scale Plausibility Score (SPS)** — a continuous score in [0, 1]. An asset whose measured primary dimension falls inside an LLM-judged plausible interval `[ℓ, u]` scores 1.0; outside, SPS decays as a Gaussian normalised by the interval half-width `h = (u − ℓ) / 2`. The normalisation means narrow-range categories (tea cup: 7–12 cm) and wide-range ones (building: 3–100 m) are penalised on the same *relative* scale. The interval itself comes from an *independent* LLM instance that never sees our dataset.
- **Coefficient of Variation (CV)** — `σ / x̄` of a category's bounding-box heights. Low CV means every chair is roughly chair-sized; high CV means the category contains objects spanning orders of magnitude.
### What the numbers actually say
- **Scale is physical, not arbitrary.** Across nine evaluated categories, AmaraSpatial-10K's 5,247 assets have a mean bounding-box height of **3.89 m**. The matched 2,856 Objaverse assets average **1,723 m** — three orders of magnitude larger, driven by outliers spanning from 2 cm to over 100 km within a single category.
- **2.9× tighter intra-category distributions.** Mean CV of **3.40** across nine categories vs. **9.92** for Objaverse. Individual categories improve dramatically — Seating drops from CV 11.75 → 1.03, Tableware from 10.13 → 2.17.
- **Scale plausibility, directly measured.** **40.7 %** of our seating assets fall in the physically plausible height range [0.6, 1.1] m, vs. only **7.7 %** in Objaverse. On our own dataset, the aggregate mean SPS across 5,247 assets is **0.68**, with **29.5 %** scoring a perfect 1.0.
- **Anchors you can actually build on.** **79.7 %** of assets land within 1 cm of their semantically correct anchor (bottom-centre, centre, or top-centre), vs. **4.2 %** in Objaverse. Only **5.2 %** of our anchors fall outside the object's own bounding box, vs. **35.2 %** in Objaverse.
- **18× richer descriptions.** Each description covers, on average, **2.62** of the 5 core visual constraint axes (Color, Material, Style, Shape, Component) used by text-to-3D models — vs. **0.14** for Objaverse tags.
See **"Generation and QC methodology"** below for how every metric is computed.
---
## At a glance
| | |
|---|---|
| **Assets** | 10,071 |
| **Total size** | >130 GB |
| **Top categories** | 11 core themes, 65 top-level classes (`ClassLabel`) |
| **Sub-categories** | 476 (`ClassLabel`) |
| **Metadata format** | Parquet (with HF `Image` features inline) |
| **Mesh format** | WebDataset `.tar` shards containing GLB binaries |
| **Texture size** | 2048 × 2048 |
| **Mean face count** | ~47,000 (main mesh), <500 (collision hull) |
| **Licence** | CC BY 4.0 |
---
## What's in the box
Every asset ships with:
- **A seed image** — the text-conditioned synthesis image used to generate the mesh.
- **A main GLB mesh** — metric-scaled, semantically anchored, UV-unwrapped, ~10 MB typical, 2K PBR textures.
- **A collision GLB** — simplified convex hull for physics and raycasting.
- **Five camera renders** — one perspective "doll-house" view plus four cardinal orthographic views (front, back, left, right).
- **Rich metadata** — 28 geometric and quality metrics, multi-sentence descriptions, structured category labels, and spatial orientation data.
Every column is filterable. Query "all animals with >80 % watertightness and <50K vertices" with a single Parquet predicate.
---
## Repository layout
```text
metadata/
train-00000-of-00006.parquet ~2.5 GB each, 6 shards
train-00001-of-00006.parquet
…
meshes/
shard-00000.tar ~5 GB each, 21 shards
shard-00001.tar each tar contains <asset_id>.glb + <asset_id>.collision.glb
…
manifest.parquet asset_id → mesh_shard + category labels (small index)
top_categories.json 65 sorted ClassLabel names
sub_categories.json 476 sorted ClassLabel names
figures/ README figures (hero, category donut, etc.)
```
You don't need to download 130 GB to poke around. The metadata parquet (~ 15 GB) has everything — descriptions, renders, quality scores — and downloads in minutes. The mesh tars (~ 115 GB) only matter when you actually want the 3D files.
---
## Schema
Every row in `metadata/*.parquet` has:
- **Identity**: `asset_id` (primary key), `top_category`, `sub_category`, `asset_basename`
- **Prompt**: `brief_description`, `full_description`
- **Visual** (HF `Image` features): `seed_image`, `render_perspective`, `render_front`, `render_back`, `render_left`, `render_right`
- **Mesh pointers**: `mesh_shard`, `mesh_path`, `collision_path` (join into the matching tar)
- **Geometry**: `vertices`, `decimation_faces`, `approx_islands`, `texture_size`, `aabb[3]`, `anchor_origin[3]`, `forward_axis`
- **Quality**: `watertight_percent`, `manifold_edge_ratio`, `degenerate_triangle_count`, `non_manifold_vertices`, `has_uv_coordinates`, `euler_number`, `unique_edges`
- **Collision mesh**: `collision_volume_ratio`, `collision_vertices`, `collision_faces`
- **Derived geometry**: `surface_area`, `mesh_volume`, `bounding_box_volume`, `average_edge_length`, `aspect_ratio`
---
## Quickstart
### Browse and filter metadata (~15 GB)
```python
from datasets import load_dataset
ds = load_dataset("zero-one-creative/spatial-10k", split="train")
print(ds)
# High-quality animals only
animals = ds.filter(
lambda r: r["top_category"] == "Animals" and r["watertight_percent"] > 80
)
print(f"{len(animals)} clean animal meshes")
animals[0]["render_perspective"].show()
```
### Stream meshes for training
```python
import webdataset as wds
url = "https://huggingface.co/datasets/zero-one-creative/spatial-10k/resolve/main/meshes/shard-{00000..00020}.tar"
pipeline = wds.WebDataset(url, shardshuffle=True).shuffle(1000)
for sample in pipeline:
asset_id = sample["__key__"] # e.g. "Animals_Dragon_SM_MeshGen_FireDragon"
glb_bytes = sample["glb"] # main mesh
coll_bytes = sample["collision.glb"] # collision mesh
# Join with metadata by asset_id for prompts + geometry fields
```
### Fetch a single asset by ID
```python
from huggingface_hub import hf_hub_download
import tarfile
row = next(r for r in ds if r["asset_id"] == "Animals_Dragon_SM_MeshGen_FireDragon")
shard = hf_hub_download(
"zero-one-creative/spatial-10k",
f"meshes/shard-{row['mesh_shard']:05d}.tar",
repo_type="dataset",
)
with tarfile.open(shard) as t:
glb_bytes = t.extractfile(row["mesh_path"]).read()
```
### Download the whole dataset (~130 GB)
```bash
hf download zero-one-creative/spatial-10k --repo-type dataset --local-dir ./spatial-10k
```
Resumable and parallel. Use `--include "metadata/*"` to grab only the metadata side.
---
## Generation and QC methodology
Every asset was produced through Zero One Creative's synthesis pipeline:
```
text-to-image seed → image-to-3D mesh → spatial alignment & scaling →
UV unwrap → mesh decimation → collision-hull simplification → multi-view render
```
### Spatial alignment
Each raw mesh is transformed by a semantically determined rigid transform plus metric scale:
- **Metric scale** — an LLM-estimated physical dimension (in metres) for the asset's subcategory sets the scale factor.
- **Rotation** — PCA combined with semantic heuristics orients each mesh so its functional front faces +X and its vertical axis aligns to +Z.
- **Anchor translation** — origin placed at bottom-centre for ground-resting objects, centre for suspended objects, top-centre for ceiling-mounted objects.
### Quality checks
Every output was rigorously quality-checked on both the main mesh and the collision mesh:
| Check | Metric | Column |
|---|---|---|
| Closed-surface completeness | % watertight triangulation | `watertight_percent` |
| Manifold geometry | Fraction of edges shared by exactly 2 faces | `manifold_edge_ratio` |
| Degenerate triangles | Zero-area / collinear triangle count | `degenerate_triangle_count` |
| Non-manifold vertices | Vertices where the surface self-intersects | `non_manifold_vertices` |
| Topology | Euler characteristic | `euler_number` |
| Collision fit | Collision-hull volume / main-mesh volume | `collision_volume_ratio` |
| UV coverage | Whether UV coordinates are present | `has_uv_coordinates` |
Every metric is a top-level column rather than a buried JSON blob — **filter for your own quality bar rather than accepting ours.** We deliberately kept borderline-watertight meshes because the optimal threshold depends heavily on downstream use (rendering vs. physics simulation).
---
## Intended uses
AmaraSpatial-10K is designed to drop into:
- **LLM-driven scene composition** — correct scale and anchors reduce floating objects and interpenetrations without algorithmic changes.
- **Embodied AI and robotics simulators** — metric scale and PBR materials shrink the sim-to-real gap.
- **Text-to-3D / image-to-3D training & evaluation** — aligned text ↔ image ↔ mesh triplets enable cross-modal objectives.
- **Retrieval systems** — multi-sentence descriptions significantly outperform sparse tags under CLIP and LLM-embedding similarity.
- **Game-engine prototyping** — production-ready GLB with collision hulls, usable zero-shot in Unreal, Unity, or Godot.
---
## Licence
Released under **Creative Commons Attribution 4.0 International (CC BY 4.0)**. You are free to use, remix, redistribute, and build upon the assets for any purpose including commercial, provided you give appropriate credit.
---
*Built by [Zero One Creative](https://01c.ai).*
提供机构:
ZeroOneCreative



