AnonymousRepository/toucan-toolcall-slca

Name: AnonymousRepository/toucan-toolcall-slca
Creator: AnonymousRepository
Published: 2026-04-23 03:34:20
License: 暂无描述

Hugging Face2026-04-23 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/AnonymousRepository/toucan-toolcall-slca

下载链接

链接失效反馈

官方服务：

资源简介：

Toucan-Toolcall (SLCA-GRPO release) 数据集是一个用于文本生成任务的数据集，特别针对LLM代理中的工具调用和函数调用，应用于强化学习领域。数据集包含四个部分：sft_split、sft_full、rl和eval，分别用于不同的训练和评估目的。sft_split包含42,423条轨迹，用于主表中的SFT预热启动（2个周期）；sft_full包含74,241条轨迹，用于“SFT full”单阶段消融实验；rl包含31,818条多轮、模式约束的轨迹，用于SLCA-GRPO在RL阶段的训练；eval包含4,000条保留的评估数据，用于领域内评估。数据集源自公开的Toucan-1.5M语料库，并经过额外的相关性过滤、严格工具模式验证、保留评估切割和RL侧分解步骤。数据集适用于工具调用代理的SFT预热启动和策略RL训练，特别是用于研究段级信用分配（SLCA-GRPO、GRPO、ToolPO等）。

The Toucan-Toolcall (SLCA-GRPO release) dataset is designed for text-generation tasks, specifically for tool-calling and function-calling in LLM agents, with applications in reinforcement learning. It includes four splits: sft_split, sft_full, rl, and eval, each serving different purposes in the training and evaluation pipeline. The sft_split contains 42,423 trajectories for SFT warm-start (2 epochs) in the main-table recipe; sft_full contains 74,241 trajectories for the "SFT full" single-stage ablation; rl contains 31,818 multi-turn, schema-constrained trajectories used by SLCA-GRPO during the RL stage; and eval contains 4,000 held-out evaluation trajectories for in-domain assessment. The dataset is derived from the publicly released Toucan-1.5M corpus and has undergone additional relevance filtering, strict tool-schema validation, a held-out evaluation cut, and an RL-side decomposition step. It is intended for SFT warm-start and on-policy RL for tool-calling agents, particularly for studying segment-level credit assignment (SLCA-GRPO, GRPO, ToolPO, etc.).

提供机构：

AnonymousRepository

5,000+

优质数据集

54 个

任务类型

进入经典数据集