Self-taught-evaluator-DPO-data

Name: Self-taught-evaluator-DPO-data
Creator: maas
Published: 2025-12-04 16:17:25
License: 暂无描述

魔搭社区2025-12-04 更新2024-10-05 收录

下载链接：

https://modelscope.cn/datasets/AI-ModelScope/Self-taught-evaluator-DPO-data

下载链接

链接失效反馈

官方服务：

资源简介：

This dataset is released as part of [Self-taught evaluators](https://arxiv.org/abs/2408.02666) research project. Please refer to our [project materials](https://github.com/facebookresearch/RAM/tree/self_taught/projects/self_taught_evaluator) here for training and evaluation details. ## Loading the dataset with transformers This dataset is built upon [WildChat](https://huggingface.co/datasets/allenai/WildChat-1M) prompts by using [Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct) to generate responses and evaluation plans. Details on how to build such a self-taught dataset can be found in [Self-taught evaluators](https://arxiv.org/abs/2408.02666). Minimal example below showing how to prepare training data. ```python from datasets import load_dataset dataset = load_dataset("facebook/Self-taught-evaluator-DPO-data") WildChat = load_dataset("allenai/WildChat-1M") hash_id2content = dict() for ex in WildChat["train"]: turn = ex["turn"] hash_id2content[ex["conversation_hash"]] = ex["conversation"][2 * (turn - 1)]["content"] train_data = [] for ex in dataset["train"]: if ex["instruction"] not in hash_id2content: continue else: ex["src"] = ex["src"].replace(ex["instruction"], hash_id2content[ex["instruction"]]) train_data.append(ex) ``` ## Citation If you use data, model, or code from this work, please cite with the following BibTex entry: ``` @article{wang2024self, title={Self-taught evaluators}, author={Wang, Tianlu and Kulikov, Ilia and Golovneva, Olga and Yu, Ping and Yuan, Weizhe and Dwivedi-Yu, Jane and Pang, Richard Yuanzhe and Fazel-Zarandi, Maryam and Weston, Jason and Li, Xian}, journal={arXiv preprint arXiv:2408.02666}, year={2024} } ``` ## License Use of this repository and related resources are governed by [Self-Taught Evaluator Research License](https://huggingface.co/facebook/Self-taught-evaluator-llama3.1-70B/blob/main/Research%20License%20for%20Self-taught%20Evaluator.pdf).

本数据集系作为**Self-taught evaluators（自主学习评估器）**研究项目的组成部分发布。有关训练与评估的详细信息，请参阅本项目的[配套资料](https://github.com/facebookresearch/RAM/tree/self_taught/projects/self_taught_evaluator)。 ## 使用Transformers库加载数据集本数据集基于[WildChat](https://huggingface.co/datasets/allenai/WildChat-1M)的提示构建，通过[Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct)生成回复与评估方案。关于此类自主学习数据集的构建细节，可参阅[Self-taught evaluators（自主学习评估器）](https://arxiv.org/abs/2408.02666)研究论文。以下为用于准备训练数据的极简示例代码： python from datasets import load_dataset dataset = load_dataset("facebook/Self-taught-evaluator-DPO-data") WildChat = load_dataset("allenai/WildChat-1M") hash_id2content = dict() for ex in WildChat["train"]: turn = ex["turn"] hash_id2content[ex["conversation_hash"]] = ex["conversation"][2 * (turn - 1)]["content"] train_data = [] for ex in dataset["train"]: if ex["instruction"] not in hash_id2content: continue else: ex["src"] = ex["src"].replace(ex["instruction"], hash_id2content[ex["instruction"]]) train_data.append(ex) ## 引用说明若您使用本工作中的数据、模型或代码，请按照以下BibTex格式进行引用： @article{wang2024self, title={Self-taught evaluators}, author={Wang, Tianlu and Kulikov, Ilia and Golovneva, Olga and Yu, Ping and Yuan, Weizhe and Dwivedi-Yu, Jane and Pang, Richard Yuanzhe and Fazel-Zarandi, Maryam and Weston, Jason and Li, Xian}, journal={arXiv preprint arXiv:2408.02666}, year={2024} } ## 许可协议本仓库及相关资源的使用受[Self-Taught Evaluator Research License（自主学习评估器研究许可协议）](https://huggingface.co/facebook/Self-taught-evaluator-llama3.1-70B/blob/main/Research%20License%20for%20Self-taught%20Evaluator.pdf)约束。

提供机构：

maas

创建时间：

2024-10-04

5,000+

优质数据集

54 个

任务类型

进入经典数据集