tom-jerry-123/Physical-AI-AV-ES
收藏Hugging Face2026-04-18 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/tom-jerry-123/Physical-AI-AV-ES
下载链接
链接失效反馈官方服务:
资源简介:
---
license: other
task_categories:
- robotics
tags:
- autonomous-driving
- waypoints
- webdataset
size_categories:
- 1M<n<10M
---
# PhysicalAI-AV-SFT
Supervised fine-tuning (SFT) dataset for an autonomous-vehicle vision-language
waypoint-prediction model. Contains **29,674 samples** from ~150 000
driving scenes (~18 seconds per scene, sampled at anchor times 2s..16s) recorded in the
United States.
## Format
[WebDataset](https://github.com/webdataset/webdataset) — 3 uncompressed `.tar` shards,
each containing pairs of files per sample:
| Entry | Description |
|---|---|
| `{key}.jpg` | Front-facing wide-angle camera frame (JPEG quality 95, 640 × 360 px) |
| `{key}.json` | Metadata (see schema below) |
**Key format:** `{scene_id}__{sample_idx:02d}`
**Shard assignment:** `sha256(scene_id) % 3` — all frames of a scene land
in the same shard, preventing scene leakage across train/eval splits.
## Metadata schema (`{key}.json`)
```json
{
"scene_id": "UUID string — identifies the driving scene",
"chunk_name": "chunk_XXXX — source data chunk",
"sample_idx": "int 2–16 — target second within the scene (15 anchors/scene)",
"global_idx": "int — globally unique datum index",
"target_t_rel_us": "int — timestamp relative to scene start (microseconds)",
"target_frame_index": "int — video frame index",
"egomotion": "list[list[float]] — full available past trajectory (incl. anchor), target-relative, 0.25s granularity",
"waypoints": "list[list[float]] — full available future trajectory, target-relative, 0.25s granularity",
"is_long_tail": "bool — long-tail driving scenario flag"
}
```
Coordinate convention: all `x`/`y`/`yaw` in the **ego-vehicle frame** at target time,
+x = forward, +y = left, yaw in radians CCW from forward.
## Loading
```python
import webdataset as wds, json
from PIL import Image
import io
# Local (after cloning the repo)
ds = wds.WebDataset("shards/train-{00000..00002}-of-00003.tar").shuffle(1000)
for sample in ds:
img = Image.open(io.BytesIO(sample["jpg"]))
meta = json.loads(sample["json"])
# meta["waypoints"] → full available future waypoints at 0.25s granularity
```
## Shard index
`index.parquet` — one row per sample, columns:
`key`, `shard`, `scene_id`, `chunk_name`, `sample_idx`, `global_idx`,
`target_t_rel_us`, `is_long_tail`.
```python
import pandas as pd
df = pd.read_parquet("index.parquet")
lt = df[df["is_long_tail"]] # long-tail subset
```
提供机构:
tom-jerry-123



