five

Language-Guided Long Horizon Manipulation with LLM-based Planning and Visual Perception

收藏
IEEE2026-04-17 收录
下载链接:
https://ieee-dataport.org/documents/language-guided-long-horizon-manipulation-llm-based-planning-and-visual-perception
下载链接
链接失效反馈
官方服务:
资源简介:
Language-guided long-horizon manipulation of deformable objects presents significant challenges due to high degrees of freedom, complex dynamics, and the need for accurate vision-language grounding. In this work, we focus on multi-step cloth folding\u2014a representative deformable-object manipulation task\u2014requiring both structured long-horizon planning and fine-grained visual perception. To this end, we propose a unified framework that integrates a Large Language Model (LLM)-based planner, a Vision-Language Model (VLM)-based perception system, and a task execution module. Specifically, The LLM-based planner decomposes high-level language instructions into low-level action primitives, bridging the semantic\u2013execution gap, aligning perception with action, and enhancing generalization. The VLM-based perception module employs a SigLIP2-driven architecture with a novel bidirectional cross-attention fusion mechanism and Weight-Decomposed Low-Rank Adaptation (DoRA)-based fine-tuning to achieve language-conditioned fine-grained visual grounding. Experiments in both simulation and real-world settings demonstrate the method\u2019s effectiveness. In simulation, it outperforms state-of-the-art (SOTA) baselines, achieving improvements of 2.23\\%, 1.87\\%, and 33.3\\% on seen instructions, unseen instructions, and unseen tasks, respectively. On a real robot, it robustly executes multi-step folding sequences from language instructions across diverse cloth materials and configurations, demonstrating strong generalization in practical scenarios.
提供机构:
Yanmin Zhou
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作