DeepPlanning

Name: DeepPlanning
Creator: maas
Published: 2026-05-16 17:59:53
License: 暂无描述

魔搭社区2026-05-16 更新2026-02-07 收录

下载链接：

https://modelscope.cn/datasets/Qwen/DeepPlanning

下载链接

链接失效反馈

官方服务：

资源简介：

# DeepPlanning: Benchmarking Long-Horizon Agentic Planning with Verifiable Constraints DeepPlanningBench is a challenging benchmark for evaluating long-horizon agentic planning capabilities of large language models (LLMs) with verifiable constraints. It features realistic multi-day travel planning and multi-product shopping tasks that require proactive information acquisition, local constrained reasoning, and global constrained optimization. 🌐 Website: https://qwenlm.github.io/Qwen-Agent/en/benchmarks/deepplanning/ 📄 Paper: https://arxiv.org/abs/2601.18137 ## Introduction While agent evaluation has shifted toward long-horizon tasks, most benchmarks still emphasize local, step-level reasoning rather than the global constrained optimization (e.g., time and financial budgets) that demands genuine planning ability. DeepPlanning addresses this gap by introducing practical long-horizon agent planning scenarios that require: - **Proactive Information Acquisition**: Actively gathering information through API calls to discover hidden environment states - **Local Constrained Reasoning**: Satisfying step-level logic and specific requirements - **Global Constrained Optimization**: Managing holistic boundaries like total budget caps and multi-day time feasibility The benchmark includes two main domains: - **Travel Planning**: Multi-day trip organization with tightly coupled time, location, and budget constraints - **Shopping Planning**: Combinatorial optimization problems to find optimal products while maximizing discount utility ## Citation If you find our work useful, please consider citing: ```bibtex @article{deepplanning, title={DeepPlanning: Benchmarking Long-Horizon Agentic Planning with Verifiable Constraints}, author={ Yinger Zhang and Shutong Jiang and Renhao Li and Jianhong Tu and Yang Su and Lianghao Deng and Xudong Guo and Chenxu Lv and Junyang Lin }, journal={arXiv preprint arXiv:2601.18137}, year={2026} } ```

# DeepPlanning：面向可验证约束的长视野智能体规划基准测试 DeepPlanningBench是一款用于评估大语言模型（Large Language Models，LLMs）具备可验证约束的长视野智能体规划能力的高挑战性基准测试集。该基准集涵盖贴合真实场景的多日旅行规划与多品类购物任务，这些任务需要主动信息获取、局部约束推理与全局约束优化能力。 🌐 官方网站：https://qwenlm.github.io/Qwen-Agent/en/benchmarks/deepplanning/ 📄 论文链接：https://arxiv.org/abs/2601.18137 ## 引言当前智能体评估的研究方向已转向长视野任务，但绝大多数基准测试仍侧重于局部、单步推理，而非真正需要规划能力的全局约束优化（例如时间与资金预算约束）。DeepPlanning针对这一不足，引入了贴合实际的长视野智能体规划场景，其要求如下： - **主动信息获取**：通过API调用主动收集信息，以挖掘隐藏的环境状态 - **局部约束推理**：满足单步逻辑规则与特定需求 - **全局约束优化**：管控整体边界，例如总预算上限与多日行程的时间可行性该基准测试包含两大核心任务域： - **旅行规划**：紧密结合时间、地点与预算约束的多日行程组织任务 - **购物规划**：在最大化折扣收益的同时寻找最优商品的组合优化问题 ## 引用方式若您认为本工作对您的研究有所帮助，请引用如下文献： bibtex @article{deepplanning, title={DeepPlanning: Benchmarking Long-Horizon Agentic Planning with Verifiable Constraints}, author={ Yinger Zhang and Shutong Jiang and Renhao Li and Jianhong Tu and Yang Su and Lianghao Deng and Xudong Guo and Chenxu Lv and Junyang Lin }, journal={arXiv preprint arXiv:2601.18137}, year={2026} }

提供机构：

maas

创建时间：

2026-01-14

5,000+

优质数据集

54 个

任务类型

进入经典数据集