Lennittus/DESPITE
收藏Hugging Face2026-04-08 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Lennittus/DESPITE
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
language:
- en
tags:
- robotics
- task-planning
- safety
- pddl
- benchmark
- embodied-ai
size_categories:
- 10K<n<100K
viewer: false
---
# DESPITE: Deterministic Evaluation of Safe Planning In embodied Task Execution
[](https://despite-safety.github.io/)
[](https://github.com/taozhang1004/DESPITE)
[](https://huggingface.co/datasets/lennittus/DESPITE)
[](https://opensource.org/licenses/MIT)
A benchmark for evaluating large language models (LLMs) on embodied safe task planning.
**Paper:** "Using large language models for embodied planning introduces systematic safety risks"
**Authors:** Tao Zhang, Kaixian Qu, Zhibin Li, Jiajun Wu, Marco Hutter, Manling Li, Fan Shi
## Quick Start
```bash
# Clone the dataset
git clone https://huggingface.co/datasets/lennittus/DESPITE
cd DESPITE
# Extract tasks (required for running evaluations)
tar -xzf tasks.tar.gz
# Optional: extract benchmark results and generation info
tar -xzf benchmark_results.tar.gz
tar -xzf generation_info.tar.gz
```
## Dataset Structure
After extraction:
```
DESPITE/
├── tasks/{split}/{subset}/{task_id}/
│ ├── code.py # Entry point for planning and evaluation
│ ├── domain.pddl # PDDL domain
│ ├── problem.pddl # PDDL problem
│ └── metadata.json # Danger formalization + reference plans
├── benchmark_results/{split}/{subset}/{task_id}.json
└── generation_info/{split}/{subset}/{task_id}.json
```
## Splits
| Split | Subset | Tasks | Description |
|-------|--------|-------|-------------|
| `full` | `easy` | 11,235 | Standard difficulty |
| `full` | `hard` | 1,044 | Complex tasks (main evaluation in paper) |
| `sampled` | `easy-100` | 100 | Quick evaluation subset |
| `sampled` | `hard-100` | 100 | Quick evaluation subset |
| `sampled` | `redundancy/base` | 50 | Base tasks for redundancy analysis |
| `sampled` | `redundancy/variants` | 300 | Variants with redundant actions added |
## Data Sources
Tasks derived from [ALFRED](https://askforalfred.com/), [BDDL](https://behavior.stanford.edu/), [VirtualHome](http://virtual-home.org/), [NormBank](https://github.com/SALT-NLP/normbank), and [NEISS](https://www.cpsc.gov/Research--Statistics/NEISS-Injury-Data).
## Citation
```bibtex
coming soon
```
## License
MIT License. See original dataset repositories for their respective terms.
提供机构:
Lennittus



