DeepPlanning
收藏魔搭社区2026-05-16 更新2026-02-07 收录
下载链接:
https://modelscope.cn/datasets/Qwen/DeepPlanning
下载链接
链接失效反馈官方服务:
资源简介:
# DeepPlanning: Benchmarking Long-Horizon Agentic Planning with Verifiable Constraints
DeepPlanningBench is a challenging benchmark for evaluating long-horizon agentic planning capabilities of large language models (LLMs) with verifiable constraints. It features realistic multi-day travel planning and multi-product shopping tasks that require proactive information acquisition, local constrained reasoning, and global constrained optimization.
🌐 Website: https://qwenlm.github.io/Qwen-Agent/en/benchmarks/deepplanning/
📄 Paper: https://arxiv.org/abs/2601.18137
## Introduction
While agent evaluation has shifted toward long-horizon tasks, most benchmarks still emphasize local, step-level reasoning rather than the global constrained optimization (e.g., time and financial budgets) that demands genuine planning ability. DeepPlanning addresses this gap by introducing practical long-horizon agent planning scenarios that require:
- **Proactive Information Acquisition**: Actively gathering information through API calls to discover hidden environment states
- **Local Constrained Reasoning**: Satisfying step-level logic and specific requirements
- **Global Constrained Optimization**: Managing holistic boundaries like total budget caps and multi-day time feasibility
The benchmark includes two main domains:
- **Travel Planning**: Multi-day trip organization with tightly coupled time, location, and budget constraints
- **Shopping Planning**: Combinatorial optimization problems to find optimal products while maximizing discount utility
## Citation
If you find our work useful, please consider citing:
```bibtex
@article{deepplanning,
title={DeepPlanning: Benchmarking Long-Horizon Agentic Planning with Verifiable Constraints},
author={
Yinger Zhang and Shutong Jiang and Renhao Li and Jianhong Tu and Yang Su and
Lianghao Deng and Xudong Guo and Chenxu Lv and Junyang Lin
},
journal={arXiv preprint arXiv:2601.18137},
year={2026}
}
```
# DeepPlanning:面向可验证约束的长视野智能体规划基准测试
DeepPlanningBench是一款用于评估大语言模型(Large Language Models,LLMs)具备可验证约束的长视野智能体规划能力的高挑战性基准测试集。该基准集涵盖贴合真实场景的多日旅行规划与多品类购物任务,这些任务需要主动信息获取、局部约束推理与全局约束优化能力。
🌐 官方网站:https://qwenlm.github.io/Qwen-Agent/en/benchmarks/deepplanning/
📄 论文链接:https://arxiv.org/abs/2601.18137
## 引言
当前智能体评估的研究方向已转向长视野任务,但绝大多数基准测试仍侧重于局部、单步推理,而非真正需要规划能力的全局约束优化(例如时间与资金预算约束)。DeepPlanning针对这一不足,引入了贴合实际的长视野智能体规划场景,其要求如下:
- **主动信息获取**:通过API调用主动收集信息,以挖掘隐藏的环境状态
- **局部约束推理**:满足单步逻辑规则与特定需求
- **全局约束优化**:管控整体边界,例如总预算上限与多日行程的时间可行性
该基准测试包含两大核心任务域:
- **旅行规划**:紧密结合时间、地点与预算约束的多日行程组织任务
- **购物规划**:在最大化折扣收益的同时寻找最优商品的组合优化问题
## 引用方式
若您认为本工作对您的研究有所帮助,请引用如下文献:
bibtex
@article{deepplanning,
title={DeepPlanning: Benchmarking Long-Horizon Agentic Planning with Verifiable Constraints},
author={
Yinger Zhang and Shutong Jiang and Renhao Li and Jianhong Tu and Yang Su and
Lianghao Deng and Xudong Guo and Chenxu Lv and Junyang Lin
},
journal={arXiv preprint arXiv:2601.18137},
year={2026}
}
提供机构:
maas
创建时间:
2026-01-14



