five

DeepPlanning

收藏
魔搭社区2026-05-16 更新2026-02-07 收录
下载链接:
https://modelscope.cn/datasets/Qwen/DeepPlanning
下载链接
链接失效反馈
官方服务:
资源简介:
# DeepPlanning: Benchmarking Long-Horizon Agentic Planning with Verifiable Constraints DeepPlanningBench is a challenging benchmark for evaluating long-horizon agentic planning capabilities of large language models (LLMs) with verifiable constraints. It features realistic multi-day travel planning and multi-product shopping tasks that require proactive information acquisition, local constrained reasoning, and global constrained optimization. 🌐 Website: https://qwenlm.github.io/Qwen-Agent/en/benchmarks/deepplanning/ 📄 Paper: https://arxiv.org/abs/2601.18137 ## Introduction While agent evaluation has shifted toward long-horizon tasks, most benchmarks still emphasize local, step-level reasoning rather than the global constrained optimization (e.g., time and financial budgets) that demands genuine planning ability. DeepPlanning addresses this gap by introducing practical long-horizon agent planning scenarios that require: - **Proactive Information Acquisition**: Actively gathering information through API calls to discover hidden environment states - **Local Constrained Reasoning**: Satisfying step-level logic and specific requirements - **Global Constrained Optimization**: Managing holistic boundaries like total budget caps and multi-day time feasibility The benchmark includes two main domains: - **Travel Planning**: Multi-day trip organization with tightly coupled time, location, and budget constraints - **Shopping Planning**: Combinatorial optimization problems to find optimal products while maximizing discount utility ## Citation If you find our work useful, please consider citing: ```bibtex @article{deepplanning, title={DeepPlanning: Benchmarking Long-Horizon Agentic Planning with Verifiable Constraints}, author={ Yinger Zhang and Shutong Jiang and Renhao Li and Jianhong Tu and Yang Su and Lianghao Deng and Xudong Guo and Chenxu Lv and Junyang Lin }, journal={arXiv preprint arXiv:2601.18137}, year={2026} } ```

# DeepPlanning:面向可验证约束的长视野智能体规划基准测试 DeepPlanningBench是一款用于评估大语言模型(Large Language Models,LLMs)具备可验证约束的长视野智能体规划能力的高挑战性基准测试集。该基准集涵盖贴合真实场景的多日旅行规划与多品类购物任务,这些任务需要主动信息获取、局部约束推理与全局约束优化能力。 🌐 官方网站:https://qwenlm.github.io/Qwen-Agent/en/benchmarks/deepplanning/ 📄 论文链接:https://arxiv.org/abs/2601.18137 ## 引言 当前智能体评估的研究方向已转向长视野任务,但绝大多数基准测试仍侧重于局部、单步推理,而非真正需要规划能力的全局约束优化(例如时间与资金预算约束)。DeepPlanning针对这一不足,引入了贴合实际的长视野智能体规划场景,其要求如下: - **主动信息获取**:通过API调用主动收集信息,以挖掘隐藏的环境状态 - **局部约束推理**:满足单步逻辑规则与特定需求 - **全局约束优化**:管控整体边界,例如总预算上限与多日行程的时间可行性 该基准测试包含两大核心任务域: - **旅行规划**:紧密结合时间、地点与预算约束的多日行程组织任务 - **购物规划**:在最大化折扣收益的同时寻找最优商品的组合优化问题 ## 引用方式 若您认为本工作对您的研究有所帮助,请引用如下文献: bibtex @article{deepplanning, title={DeepPlanning: Benchmarking Long-Horizon Agentic Planning with Verifiable Constraints}, author={ Yinger Zhang and Shutong Jiang and Renhao Li and Jianhong Tu and Yang Su and Lianghao Deng and Xudong Guo and Chenxu Lv and Junyang Lin }, journal={arXiv preprint arXiv:2601.18137}, year={2026} }
提供机构:
maas
创建时间:
2026-01-14
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作