lance-format/lerobot-xvla-soft-fold
收藏Hugging Face2026-02-27 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/lance-format/lerobot-xvla-soft-fold
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
configs:
- config_name: frames
data_dir: data/frames.lance
- config_name: episodes
data_dir: data/episodes.lance
- config_name: videos
data_dir: data/videos.lance
task_categories:
- robotics
tags:
- LeRobot
---
This dataset was created using [LeRobot](https://github.com/huggingface/lerobot).
## Dataset Description
**Repository:** [X-VLA](https://thu-air-dream.github.io/X-VLA/)
**License:** Apache 2.0
**Paper:** *Zheng et al., 2025, “X-VLA: Soft-Prompted Transformer as Scalable Cross-Embodiment Vision-Language-Action Model”* ([arXiv:2510.10274](https://arxiv.org/pdf/2510.10274))
## What this dataset contains
This is the Lance-format version of [lerobot/xvla-soft-fold](https://huggingface.co/datasets/lerobot/xvla-soft-fold), designed for efficient frame-level sampling and sequential episode loading.
- `1,542` episodes
- `2,852,512` frames
- `20` FPS
- 3 camera streams per episode (`cam_high`, `cam_left_wrist`, `cam_right_wrist`)
- robot state vectors and action vectors aligned to frame timestamps
## Dataset structure
The dataset is organized under `data/` with three Lance tables:
### Frames table
This is the main table for model training and analytics at frame granularity. Each row is one frame with aligned state/action metadata and indexing fields so you can filter by episode, iterate temporally, or build sampled batches directly.
Schema:
- `observation_state` (`list<float>`): robot state vector for that frame.
- `action` (`list<float>`): action vector for that frame.
- `time_stamp` (`float`): original source timestamp field.
- `timestamp` (`float`): canonical frame timestamp.
- `frame_index` (`int64`): frame index within episode.
- `episode_index` (`int64`): parent episode id.
- `index` (`int64`): global frame index.
- `task_index` (`int64`): task id.
### Episodes table
This table is optimized for sequence-aware loading. Each row represents one complete episode and stores per-episode arrays (`timestamps`, `actions`, `observation_state`) plus per-camera video blobs and timestamp ranges. Use this table when you need contiguous windows, trajectory-level batching, or synchronized decoding from episode-level video chunks.
Schema:
- `episode_index` (`int64`, required): episode id.
- `task_index` (`int64`, required): task id.
- `fps` (`int32`, required): frame rate.
- `timestamps` (`list<float>`): per-frame timestamps for the episode.
- `actions` (`list<list<float>>`): per-frame action vectors.
- `observation_state` (`list<list<float>>`): per-frame robot state vectors.
- `observation_images_cam_high_video_blob` (`large_binary` blob): encoded video segment for `cam_high`.
- `observation_images_cam_high_from_timestamp` (`double`): segment start time for `cam_high`.
- `observation_images_cam_high_to_timestamp` (`double`): segment end time for `cam_high`.
- `observation_images_cam_left_wrist_video_blob` (`large_binary` blob): encoded video segment for `cam_left_wrist`.
- `observation_images_cam_left_wrist_from_timestamp` (`double`): segment start time for `cam_left_wrist`.
- `observation_images_cam_left_wrist_to_timestamp` (`double`): segment end time for `cam_left_wrist`.
- `observation_images_cam_right_wrist_video_blob` (`large_binary` blob): encoded video segment for `cam_right_wrist`.
- `observation_images_cam_right_wrist_from_timestamp` (`double`): segment start time for `cam_right_wrist`.
- `observation_images_cam_right_wrist_to_timestamp` (`double`): segment end time for `cam_right_wrist`.
### Videos table
This table stores raw MP4 payloads from the source and file-level provenance metadata. It is useful when you want direct access to original encoded video assets, integrity checks (`sha256`), or custom decoding pipelines that operate on the original video files themselves, rather than episode/frame abstractions.
Schema:
- `camera_angle` (`string`, required): camera key.
- `chunk_index` (`int32`): chunk id parsed from path.
- `file_index` (`int32`): file id parsed from path.
- `relative_path` (`string`, required): original relative path in dataset.
- `filename` (`string`, required): MP4 filename.
- `file_size_bytes` (`int64`, required): file size.
- `sha256` (`string`, required): SHA256 digest.
- `video_blob` (`large_binary`, required blob): raw MP4 bytes.
## Usage
In the following sections, we'll show how to work with the dataset in Lance or LanceDB.
### Read with Lance
```python
import lance
root_path = "hf://datasets/lance-format/lerobot-xvla-soft-fold/data"
frames_table_name = "frames.lance"
episodes_table_name = "episodes.lance"
videos_table_name = "videos.lance"
ds = lance.dataset(f"{root_path}/{frames_table_name}")
print(ds.count_rows())
ds = lance.dataset(f"{root_path}/{episodes_table_name}")
print(ds.count_rows())
ds = lance.dataset(f"{root_path}/{videos_table_name}")
print(ds.count_rows())
# Returns:
# 2852512
# 1542
# 104
```
### Inspect a few frames
```python
import lance
root_path = "hf://datasets/lance-format/lerobot-xvla-soft-fold/data"
frames_table_name = "frames.lance"
frames = lance.dataset(f"{root_path}/{frames_table_name}")
print(f"There are {frames.count_rows()} frames in total")
# pip install polars
res = frames.scanner(
columns=["episode_index", "frame_index", "timestamp"],
limit=2,
).to_table()
print(res)
# Returns
# There are 2852512 frames in total
# pyarrow.Table
# episode_index: int64
# frame_index: int64
# timestamp: float
# ----
# episode_index: [[0,0]]
# frame_index: [[0,1]]
# timestamp: [[0,0.05]]
```
### Retrieving and saving video blobs
```py
from pathlib import Path
import lance
root_path = "hf://datasets/lance-format/lerobot-xvla-soft-fold/data"
episodes_table_name = "episodes.lance"
ds = lance.dataset(f"{root_path}/{episodes_table_name}")
out = Path("video_blobs")
out.mkdir(exist_ok=True)
# Retrieve first two videos from the episodes table
for offset in range(0, 2):
row = (
ds.scanner(
columns=["episode_index", "observation_images_cam_high_video_blob"],
blob_handling="all_binary",
limit=2,
offset=offset,
)
.to_table()
.to_pylist()[0]
)
# Write the video blob to a file
(out / f"episode_{row['episode_index']}.mp4").write_bytes(
row["observation_images_cam_high_video_blob"]
)
```
This outputs the retrieved blobs as MP4 files in a local directory.
### Random seek on subsets of video
The snippet shown below reads one episode’s video blob directly from HF Hub via Lance, computes a tiny time window inside that episode, opens the blob as a stream (without downloading full data into a local file), seeks to the start timestamp, and prints the blob size plus the exact seek positions in seconds and stream PTS units.
```py
import av
import lance
DATASET_URI = "hf://datasets/lance-format/lerobot-xvla-soft-fold/data/episodes.lance"
EPISODE_INDEX = 30
START_OFFSET_S = 1.0
WINDOW_S = 0.5
ds = lance.dataset(DATASET_URI)
row = ds.scanner(
columns=[
"episode_index",
"observation_images_cam_high_from_timestamp",
"observation_images_cam_high_to_timestamp",
"_rowid",
],
with_row_id=True,
filter=f"episode_index = {EPISODE_INDEX}",
limit=1,
).to_table().to_pylist()[0]
start_s = row["observation_images_cam_high_from_timestamp"] + START_OFFSET_S
end_s = min(
start_s + WINDOW_S,
row["observation_images_cam_high_to_timestamp"],
)
blob = ds.take_blobs("observation_images_cam_high_video_blob", ids=[row["_rowid"]])[0]
with av.open(blob) as container:
stream = container.streams.video[0]
stream.codec_context.skip_frame = "NONKEY"
start_pts = int(start_s / stream.time_base)
end_pts = int(end_s / stream.time_base)
container.seek(start_pts, stream=stream)
print(f"episode_index={row['episode_index']}")
print(f"blob_size_bytes={blob.size()}")
print(f"seek_start_seconds={start_s:.3f}")
print(f"seek_end_seconds={end_s:.3f}")
print(f"seek_start_pts={start_pts}")
print(f"seek_end_pts={end_pts}")
blob.close()
```
### LanceDB search
LanceDB users can also interface with the Lance dataset on the Hub. The key step is to
connect to the dataset repo and open the relevant table.
```py
import lancedb
db = lancedb.connect("hf://datasets/lance-format/lerobot-xvla-soft-fold/data")
tbl = db.open_table("episodes")
# Search without any parameters
results = (
tbl.search()
.select(
[
"episode_index",
"observation_images_cam_high_from_timestamp",
"observation_images_cam_high_to_timestamp",
]
)
.limit(3)
.to_list()
)
for result in results:
print(
f"{result['episode_index']} | {result['observation_images_cam_high_from_timestamp']} | {result['observation_images_cam_high_to_timestamp']}"
)
# Returns:
# 0 | 0.0 | 122.95
# 1 | 122.95 | 230.65
# 2 | 230.65 | 340.0
```
### Download
If you need to make modifications to the data or work with the raw files directly, you can do a
full download of the dataset locally.
> **⚠️ Large dataset download**
> The full dataset is >50GB in size, so ensure you have sufficient disk space available.
```bash
uv run hf download lance-format/lerobot-xvla-soft-fold --repo-type dataset --local-dir .
```
提供机构:
lance-format



