rubricreward/R3-Dataset-14K
收藏Hugging Face2025-06-21 更新2025-07-05 收录
下载链接:
https://hf-mirror.com/datasets/rubricreward/R3-Dataset-14K
下载链接
链接失效反馈官方服务:
资源简介:
R3-Dataset-14k是一个用于训练R3系列鲁棒评分无关奖励模型的数据集。该数据集从超过100万个公开可用示例中开始构建,涵盖了通用聊天、推理和分类任务,并为每个示例即时生成评分量表和解释跟踪。经过过滤和精炼后,形成了用于监督训练的小型、高质量数据集。
R3-Dataset-14k is a dataset curated to train rubric reward models for R3, a series of Robust Rubric-Agnostic Reward Models. It starts from a large pool of publicly available datasets spanning over 1 million examples, including general chat, reasoning, and classification tasks, and enriches each example with on-the-fly rubric generation and explanation traces. After filtering and refinement, smaller, higher-quality datasets are produced for supervised training.
提供机构:
rubricreward



