Prompt tuning with preference ranking for few-shot pre-trained decision transformer
收藏中国科学数据2026-01-04 更新2026-04-25 收录
下载链接:
https://www.sciengine.com/AA/doi/10.1007/s11432-024-4545-1
下载链接
链接失效反馈官方服务:
资源简介:
Prompt tuning has emerged as a promising method for adapting pre-trained models to downstream tasks or aligning with human preferences. Prompt learning is widely used in natural language processing (NLP) but has limited applicability to reinforcement learning (RL) due to the complex physical meaning and environment-specific information contained within RL prompts. Directly extending prompt-tuning approaches to RL is challenging because RL prompts guide agent behavior based on environmental modeling and analysis, rather than adjusting the prompt format for downstream tasks as widely used in NLP. In this work, we propose the prompt-tuning decision transformer (DT) algorithm to address these challenges by using trajectory segments as prompts to guide RL agents in acquiring environmental information and optimizing prompts via black-box tuning to enhance their ability to contain more relevant information, thereby enabling agents to make better decisions. Our approach involves randomly sampling a Gaussian distribution to fine-tune the elements of the prompt trajectory and using the preference ranking function to find the optimization direction, thereby providing more informative prompts and guiding the agent toward specific preferences in the target environment. Extensive experiments show that with only 0.03% of the parameters learned, Prompt-Tuning DT achieves comparable or even better performance than full-model fine-tuning in few-shot settings. Our research represents a pioneering contribution to the development of prompt-tuning techniques within RL, offering a promising avenue for optimizing large-scale pre-trained RL agents for tasks tailored to specific preferences.
创建时间:
2025-12-19



