five

Lennittus/DESPITE

收藏
Hugging Face2026-04-08 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Lennittus/DESPITE
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit language: - en tags: - robotics - task-planning - safety - pddl - benchmark - embodied-ai size_categories: - 10K<n<100K viewer: false --- # DESPITE: Deterministic Evaluation of Safe Planning In embodied Task Execution [![Project Page](https://img.shields.io/badge/Project-Page-blue)](https://despite-safety.github.io/) [![Code](https://img.shields.io/badge/GitHub-Code-black?logo=github)](https://github.com/taozhang1004/DESPITE) [![Dataset](https://img.shields.io/badge/HuggingFace-Dataset-yellow?logo=huggingface)](https://huggingface.co/datasets/lennittus/DESPITE) [![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT) A benchmark for evaluating large language models (LLMs) on embodied safe task planning. **Paper:** "Using large language models for embodied planning introduces systematic safety risks" **Authors:** Tao Zhang, Kaixian Qu, Zhibin Li, Jiajun Wu, Marco Hutter, Manling Li, Fan Shi ## Quick Start ```bash # Clone the dataset git clone https://huggingface.co/datasets/lennittus/DESPITE cd DESPITE # Extract tasks (required for running evaluations) tar -xzf tasks.tar.gz # Optional: extract benchmark results and generation info tar -xzf benchmark_results.tar.gz tar -xzf generation_info.tar.gz ``` ## Dataset Structure After extraction: ``` DESPITE/ ├── tasks/{split}/{subset}/{task_id}/ │ ├── code.py # Entry point for planning and evaluation │ ├── domain.pddl # PDDL domain │ ├── problem.pddl # PDDL problem │ └── metadata.json # Danger formalization + reference plans ├── benchmark_results/{split}/{subset}/{task_id}.json └── generation_info/{split}/{subset}/{task_id}.json ``` ## Splits | Split | Subset | Tasks | Description | |-------|--------|-------|-------------| | `full` | `easy` | 11,235 | Standard difficulty | | `full` | `hard` | 1,044 | Complex tasks (main evaluation in paper) | | `sampled` | `easy-100` | 100 | Quick evaluation subset | | `sampled` | `hard-100` | 100 | Quick evaluation subset | | `sampled` | `redundancy/base` | 50 | Base tasks for redundancy analysis | | `sampled` | `redundancy/variants` | 300 | Variants with redundant actions added | ## Data Sources Tasks derived from [ALFRED](https://askforalfred.com/), [BDDL](https://behavior.stanford.edu/), [VirtualHome](http://virtual-home.org/), [NormBank](https://github.com/SALT-NLP/normbank), and [NEISS](https://www.cpsc.gov/Research--Statistics/NEISS-Injury-Data). ## Citation ```bibtex coming soon ``` ## License MIT License. See original dataset repositories for their respective terms.
提供机构:
Lennittus
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作