rubricreward/R3-Dataset-4K
收藏Hugging Face2025-05-21 更新2025-07-05 收录
下载链接:
https://hf-mirror.com/datasets/rubricreward/R3-Dataset-4K
下载链接
链接失效反馈官方服务:
资源简介:
R3-Dataset-4k是一个我们为了训练R3的评分奖励模型而整理的数据集。我们从超过100万个示例的公开可用数据集开始,这些示例包括一般性的聊天、推理和分类任务,然后实时为每个示例生成评分量表和解释跟踪。最后,我们应用过滤和精炼来产生更小、质量更高的数据集,用于监督训练。更多信息请查看我们的论文!
R3-Dataset-4k is a dataset we curated to train rubric reward models for R3, a series of Robust Rubric-Agnostic Reward Models. We begin with a large pool of publicly available datasets spanning over 1 million examples, which include general chat, reasoning, and classification tasks and then enrich each example with on-the-fly rubric generation and explanation traces. Finally, we apply filtering and refinement to produce smaller, higher-quality datasets used in supervised training.
提供机构:
rubricreward



