ibm-research/ToolRM-train-data
收藏Hugging Face2025-11-01 更新2025-11-15 收录
下载链接:
https://hf-mirror.com/datasets/ibm-research/ToolRM-train-data
下载链接
链接失效反馈官方服务:
资源简介:
ToolRM训练数据集是一个用于训练评估和提升大型语言模型中函数调用能力的奖励模型的训练数据集。它包含了约459K个示例,每个示例包括用户助手对话、可用的工具规范以及正确和错误的工具调用对。这个数据集可以帮助训练出的模型在下游任务性能上平均提高25%,增强对输入噪声的鲁棒性,并通过奖励引导的过滤实现数据高效微调。
ToolRM Training Dataset is a training dataset designed for training outcome reward models to evaluate and improve function-calling capabilities in large language models. It consists of approximately 459K examples, each including a user-assistant conversation, available tool specifications, and a pair of correct and incorrect tool calls. This dataset helps trained models to achieve an average improvement of up to 25% in downstream task performance, enhance robustness to input noise, and enable data-efficient fine-tuning through reward-guided filtering.
提供机构:
ibm-research



