initialneil/TEDWB1k-preview
收藏Hugging Face2026-04-09 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/initialneil/TEDWB1k-preview
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-nc-nd-4.0
language:
- en
pretty_name: TEDWB1k-preview
size_categories:
- 1K<n<10K
task_categories:
- other
tags:
- 3d-human
- smpl-x
- flame
- avatar
- gaussian-splatting
- video
- motion-capture
- ted
- preview
configs:
- config_name: subjects
data_files:
- split: train
path: metadata/subjects_train.parquet
- split: train_subset_x1
path: metadata/subjects_train_subset_x1.parquet
- split: train_subset_x12
path: metadata/subjects_train_subset_x12.parquet
- split: train_val
path: metadata/subjects_train_val.parquet
- split: test
path: metadata/subjects_test.parquet
---
# TEDWB1k-preview
> ⚠️ **This is the public preview of [`initialneil/TEDWB1k`](https://huggingface.co/datasets/initialneil/TEDWB1k).**
> The full dataset (1,431 TED-talk speakers, ~120 GB) is hosted at the gated repo above.
> This preview repo exists so the HuggingFace **Dataset Viewer** can render the
> per-subject thumbnails and tracking grids without going through the gated EULA flow.
>
> If you want the **full training data** (frames + mattes + per-frame SMPL-X / FLAME
> tracking for all 1,431 subjects), go to the gated main repo, accept the agreement,
> and use [`load_tedwb1k.py`](https://huggingface.co/datasets/initialneil/TEDWB1k/blob/main/load_tedwb1k.py).
## What's in this preview
This repo is **schema-identical** to the gated main repo, but the heavy
per-subject data (`frames.tar`, `mattes.tar`, tracking pickles) is included
**only for the 12 subjects in `train_subset_x12`** as a working sample. For
the other 1,419 subjects, only the metadata and QC visualizations are
present (so the viewer table still lists all 1,431 entries).
| What | Subjects | Size |
|---|---:|---:|
| Per-split parquets with embedded source-frame thumbnails | 1,431 | ~210 MB |
| `metadata/previews/<id>.jpg` (1024×1024 source frames) | 1,431 | ~210 MB |
| `metadata/ehm/<id>.jpg` (full-res SMPL-X overlay grids) | 1,431 | ~17.6 GB |
| `metadata/flame/<id>.jpg` (full-res FLAME overlay grids) | 1,431 | ~8.0 GB |
| `metadata/base/<id>.jpg` (full-res PIXIE+Sapiens grids) | 1,431 | ~5.2 GB |
| **Per-subject heavy data** (frames.tar + mattes.tar + tracking) | **12** | ~540 MB |
| **Total preview** | | **~32 GB** |
The HF Dataset Viewer above renders 5 tabs (`train`, `train_subset_x1`,
`train_subset_x12`, `train_val`, `test`) with one row per subject, the
per-subject frame and shot counts, and a thumbnail of the first source
frame. Each thumbnail is the actual `shots_images/<id>/<first_shot>/000000.jpg`
that the tracker consumed.
## Quick start
If you only want to play with one of the 12 sample subjects (no agreement
required):
```bash
pip install huggingface_hub
python -c "
from huggingface_hub import snapshot_download
snapshot_download(
'initialneil/TEDWB1k-preview',
repo_type='dataset',
allow_patterns='subjects/-2Dj9M71JAc/*',
local_dir='./tedwb1k_x1',
)
"
```
That gives you tracking pickles + `frames.tar` + `mattes.tar` for one
sample subject in a few seconds. To turn it into the 5-file bundle that
[HolisticAvatar](https://github.com/initialneil/HolisticAvatar)'s
`TrackedData` expects, use the same `load_tedwb1k.py` from the main repo:
```bash
wget https://huggingface.co/datasets/initialneil/TEDWB1k/raw/main/load_tedwb1k.py
python load_tedwb1k.py --split train_subset_x1 --out ./tedwb1k_x1 \
--repo_id initialneil/TEDWB1k-preview
```
For the **full 1,361-subject training set**, request access at the
[gated main repo](https://huggingface.co/datasets/initialneil/TEDWB1k).
## Per-subject visualizations
Each of the 1,431 subjects has 4 standalone visualization files under
`metadata/`:
- `metadata/previews/<id>.jpg` — clean 1024×1024 source frame (the first
frame of the first shot). Embedded in the parquet preview column too.
- `metadata/ehm/<id>.jpg` — full-resolution SMPL-X overlay grid from the
final tracking stage (large vertical contact sheet, ~13 MB).
- `metadata/flame/<id>.jpg` — FLAME overlay grid from the intermediate
face-fitting stage (~6 MB).
- `metadata/base/<id>.jpg` — stage-1 PIXIE+Sapiens overlay grid (~4 MB).
You can fetch any single subject's visualization with one
`hf_hub_download` call:
```python
from huggingface_hub import hf_hub_download
path = hf_hub_download(
'initialneil/TEDWB1k-preview',
'metadata/ehm/05jJodDVJRQ.jpg',
repo_type='dataset',
)
```
## Splits
Same as the main repo:
| Split | Subjects | Notes |
|---|---:|---|
| `train_subset_x1` | 1 | tiny single-subject overfit (⊂ `train`) |
| `train_subset_x12` | 12 | small overfit (⊂ `train`) — **the only subjects with downloadable heavy data in this preview** |
| `train_val` | 20 | training monitor (⊂ `train`) |
| `test` | 70 | identity-disjoint evaluation |
| `train` | 1,361 | full training pool |
| **total** | **1,431** | |
`train` (1,361) and `test` (70) are identity-disjoint and together cover
all 1,431 subjects. `train_subset_x1`, `train_subset_x12`, and `train_val`
are subsets of `train`.
## Why two repos?
HuggingFace's Dataset Viewer cannot render tabs/thumbnails for **gated**
datasets — the worker that computes split names runs without a user
identity and can't satisfy the gating EULA. The full TEDWB1k is gated for
TED-content compliance, so to keep the viewer working we mirror the
metadata + a 12-subject sample to this public preview repo.
For full discussion see this thread:
<https://discuss.huggingface.co/t/after-gated-user-access-was-enabled-the-huggingface-not-showing-dataset-viewer/157333>
## License
[**CC-BY-NC-ND 4.0**](https://creativecommons.org/licenses/by-nc-nd/4.0/) —
same as the main repo. Non-commercial research use only. Attribution
required. **No derivatives** — you may not distribute modified or remixed
versions of this dataset.
The tracking parameters, JPG frames, and mattes are all derived works of
TED talk videos that are themselves CC-BY-NC-ND on ted.com. This dataset
matches the upstream license to remain compatible with TED's source
restrictions.
## Links
- **Full gated dataset**: <https://huggingface.co/datasets/initialneil/TEDWB1k>
- Tracking pipeline: <https://github.com/initialneil/HolisticTracker>
- HolisticAvatar (downstream model): <https://github.com/initialneil/HolisticAvatar>
提供机构:
initialneil



