allenai/tulu-3-sft-reused-on-policy-70b
收藏Hugging Face2024-11-21 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/allenai/tulu-3-sft-reused-on-policy-70b
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个偏好数据集,包含了来自Tulu-3-SFT的提示和19,444个生成对,这些生成对是通过多个模型生成的。数据集的特征包括id、prompt、chosen和rejected,其中chosen和rejected是包含内容和角色的列表。数据集的分割包括train,包含19,453个样本。数据集的生成方法是通过合成管道结合了on-policy和off-policy数据,并使用Ultrafeedback模板和LLM法官进行偏好注释。数据集的使用受到ODC-BY许可证的约束,适用于研究和教育用途。
This dataset is a preference dataset that includes prompts from Tulu-3-SFT and 19,444 generation pairs generated by multiple models. The features of the dataset include id, prompt, chosen, and rejected, where chosen and rejected are lists containing content and role. The datasets split includes train, containing 19,453 examples. The generation approach involves a synthetic pipeline combining on-policy and off-policy data, with preference annotations using the Ultrafeedback template and an LLM judge. The dataset is licensed under ODC-BY and is intended for research and educational use.
提供机构:
allenai



