antony-bryan-3D2Y/synthetic-preference-data
收藏Hugging Face2026-04-22 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/antony-bryan-3D2Y/synthetic-preference-data
下载链接
链接失效反馈官方服务:
资源简介:
一个小型的、合成的偏好数据集,旨在测试RLHF/DPO训练流程。每个示例包含一个提示和两个响应,一个是正确的(chosen),另一个是有细微缺陷的(rejected)。数据集由OpenAI的gpt-4o-mini模型生成,采用了特定的生成方法和提示模板。数据集经过多阶段的过滤流程,包括长度、独特性、去重和安全性检查。数据集分为训练集和测试集,并进行了质量审核。已知的局限性包括单模型生成、小规模、合成的拒绝响应、仅限英语以及技术主题偏向。
A small, synthetically generated preference dataset intended for testing RLHF / DPO training pipelines. Each example contains a prompt and two responses — one correct (`chosen`), one subtly flawed (`rejected`). The dataset is generated using OpenAIs gpt-4o-mini model with specific generation methods and prompt templates. It undergoes a multi-stage filtering pipeline including length, distinctness, deduplication, and safety checks. The dataset is split into training and test sets and has undergone a quality audit. Known limitations include single-model generation, small scale, synthetic rejected responses, English-only content, and a skew toward technical topics.
提供机构:
antony-bryan-3D2Y



