five

Zoorao/HTEWorld

收藏
Hugging Face2026-05-20 更新2026-05-31 收录
下载链接:
https://hf-mirror.com/datasets/Zoorao/HTEWorld
下载链接
链接失效反馈
官方服务:
资源简介:
--- pretty_name: HTEWorld license: cc-by-nc-4.0 language: - en tags: - robotics - embodied-ai - world-modeling - video-generation - behavior-1k size_categories: - 100K<n<1M --- # HTEWorld HTEWorld is a benchmark for long-horizon world modeling in hybrid embodied tasks, where navigation and manipulation instructions are interleaved over extended trajectories. This repository contains the data released for WEM. It has two splits: - `train/`: training annotations for BEHAVIOR-1K videos. - `eval/`: complete evaluation trajectories, including videos and prompts. ## Dataset Structure ```text HTEWorld/ ├── train/ │ ├── task-0000/ │ │ ├── episode_*/ │ │ │ ├── clip_*/ │ │ │ │ ├── caption.txt │ │ │ │ └── mask.npz │ │ │ └── ... │ │ └── ... │ └── ... └── eval/ ├── task_001/ │ ├── first_frame.jpg │ ├── video.mp4 │ ├── prompts.txt │ └── prompt_nav_manip.txt └── ... ``` ## Splits ### Train The training split contains WEM annotations for BEHAVIOR-1K training videos. It does not include the raw training videos. The released training annotations cover: - `task-0000` to `task-0008` - `task-0010` The first five episodes of each task are excluded from the training annotations. Empty clips without complete annotations are omitted. Each valid clip contains: - `caption.txt`: text instruction annotation. - `mask.npz`: motion mask annotation used for WEM training. To use this split for training, first download and preprocess the corresponding BEHAVIOR-1K videos with the WEM preprocessing tools, then merge these annotations into the processed video directory. ### Eval The evaluation split contains 300 benchmark trajectories. Each task directory includes: - `first_frame.jpg`: initial conditioning frame. - `video.mp4`: full ground-truth trajectory. - `prompts.txt`: one instruction per line. - `prompt_nav_manip.txt`: navigation/manipulation phase labels aligned with `prompts.txt`. The official evaluator reads the full `video.mp4` and segments it according to the fixed HTEWorld evaluation protocol. ## Download ```bash huggingface-cli download Zoorao/HTEWorld \ --repo-type dataset \ --local-dir HTEWorld ``` ## Evaluation Generate predictions with the WEM codebase: ```bash python generate.py \ --ckpt_dir <WEM_CHECKPOINT_DIR> \ --wan_ckpt_dir <WAN2.2_CHECKPOINT_DIR> \ --qwen_ckpt_dir <QWEN3_VL_CHECKPOINT_DIR> \ --benchmark_root HTEWorld/eval \ --output_dir <PREDICTION_ROOT> ``` Then compute the formal HTEWorld metrics: ```bash python eval/evaluate.py \ --output-root <PREDICTION_ROOT> \ --benchmark-root HTEWorld/eval \ --metrics formal \ --model-name <MODEL_NAME> ``` The six formal metrics are RCBD, LPSA, CISR, PMPA, CPDM, and FPHSC. ## License This dataset is released for non-commercial research use under the Creative Commons Attribution-NonCommercial 4.0 International License. Users should also respect the terms of the underlying BEHAVIOR-1K data. ## Citation ```bibtex @article{wem2026, title={World-Ego Modeling for Long-Horizon Evolution in Hybrid Embodied Tasks}, author={Lin, Zuyao and Zhang, Jianhui and Jia, Peidong and Zhao, Xiaoguang and Zhang, Shanghang and Chen, Xingyu}, journal={arXiv preprint arXiv:2605.19957}, year={2026} } ```
提供机构:
Zoorao
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作