LLM360/guru-RL-92k
收藏Hugging Face2025-08-20 更新2025-07-05 收录
下载链接:
https://hf-mirror.com/datasets/LLM360/guru-RL-92k
下载链接
链接失效反馈官方服务:
资源简介:
Guru是一个专门为训练大型语言模型(LLM)进行复杂推理而设计的六领域数据集,采用强化学习(RL)。该数据集包含了91.9K高质量样本,跨越六个不同的推理密集型领域,并经过五阶段的精选流程,以确保领域的多样性和奖励的可验证性。数据集旨在提高LLM在数学、编程、科学、逻辑、模拟和表格推理领域的跨领域推理能力。
Guru is a curated six-domain dataset designed for training large language models (LLM) for complex reasoning with reinforcement learning (RL). The dataset contains 91.9K high-quality samples spanning six diverse reasoning-intensive domains, processed through a comprehensive five-stage curation pipeline to ensure both domain diversity and reward verifiability, aiming to enhance the cross-domain reasoning capabilities of LLMs in areas such as math, coding, science, logic, simulation, and tabular reasoning.
提供机构:
LLM360



