AnonymousRepository/toucan-toolcall-slca
收藏Hugging Face2026-04-23 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/AnonymousRepository/toucan-toolcall-slca
下载链接
链接失效反馈官方服务:
资源简介:
Toucan-Toolcall (SLCA-GRPO release) 数据集是一个用于文本生成任务的数据集,特别针对LLM代理中的工具调用和函数调用,应用于强化学习领域。数据集包含四个部分:sft_split、sft_full、rl和eval,分别用于不同的训练和评估目的。sft_split包含42,423条轨迹,用于主表中的SFT预热启动(2个周期);sft_full包含74,241条轨迹,用于“SFT full”单阶段消融实验;rl包含31,818条多轮、模式约束的轨迹,用于SLCA-GRPO在RL阶段的训练;eval包含4,000条保留的评估数据,用于领域内评估。数据集源自公开的Toucan-1.5M语料库,并经过额外的相关性过滤、严格工具模式验证、保留评估切割和RL侧分解步骤。数据集适用于工具调用代理的SFT预热启动和策略RL训练,特别是用于研究段级信用分配(SLCA-GRPO、GRPO、ToolPO等)。
The Toucan-Toolcall (SLCA-GRPO release) dataset is designed for text-generation tasks, specifically for tool-calling and function-calling in LLM agents, with applications in reinforcement learning. It includes four splits: sft_split, sft_full, rl, and eval, each serving different purposes in the training and evaluation pipeline. The sft_split contains 42,423 trajectories for SFT warm-start (2 epochs) in the main-table recipe; sft_full contains 74,241 trajectories for the "SFT full" single-stage ablation; rl contains 31,818 multi-turn, schema-constrained trajectories used by SLCA-GRPO during the RL stage; and eval contains 4,000 held-out evaluation trajectories for in-domain assessment. The dataset is derived from the publicly released Toucan-1.5M corpus and has undergone additional relevance filtering, strict tool-schema validation, a held-out evaluation cut, and an RL-side decomposition step. It is intended for SFT warm-start and on-policy RL for tool-calling agents, particularly for studying segment-level credit assignment (SLCA-GRPO, GRPO, ToolPO, etc.).
提供机构:
AnonymousRepository



