Wildchat-RIP-Filtered-by-8b-Llama

Name: Wildchat-RIP-Filtered-by-8b-Llama
Creator: maas
Published: 2025-07-11 16:28:45
License: 暂无描述

魔搭社区2025-07-11 更新2025-05-24 收录

下载链接：

https://modelscope.cn/datasets/facebook/Wildchat-RIP-Filtered-by-8b-Llama

下载链接

链接失效反馈

官方服务：

资源简介：

[RIP](https://arxiv.org/abs/2501.18578) is a method for perference data filtering. The core idea is that low-quality input prompts lead to high variance and low-quality responses. By measuring the quality of rejected responses and the reward gap between chosen and rejected preference pairs, RIP effectively filters prompts to enhance dataset quality. We release 4k data that filtered from 20k [Wildchat prompts](https://huggingface.co/datasets/allenai/WildChat-1M). For each prompt, we provide 64 responses from [Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) and their corresponding rewards obtained from [ArmoRM](https://huggingface.co/RLHFlow/ArmoRM-Llama3-8B-v0.1). We use the ”best-vs-worst” preference pairing method in RIP experiments, however, this data can also be used with GRPO. This dataset is ideal for training smaller models. For larger and more powerful models, we recommend using the [Wildchat-RIP-Filtered-by-70b-Llama dataset](https://huggingface.co/datasets/facebook/Wildchat-RIP-Filtered-by-70b-Llama). You can load the dataset as follows ```python from datasets import load_dataset ds = load_dataset("facebook/Wildchat-RIP-Filtered-by-8b-Llama") ``` For more information regarding data collection, please refer to our [paper](https://arxiv.org/pdf/2501.18578). ## Citation If you use data, please cite with the following BibTex entry: ``` @article{yu2025rip, title={RIP: Better Models by Survival of the Fittest Prompts}, author={Yu, Ping and Yuan, Weizhe and Golovneva, Olga and Wu, Tianhao and Sukhbaatar, Sainbayar and Weston, Jason and Xu, Jing}, journal={arXiv preprint arXiv:2501.18578}, year={2025} } ```

**RIP**（论文链接：https://arxiv.org/abs/2501.18578）是一种面向偏好数据的过滤方法。其核心逻辑为：低质量的输入提示词会引发生成结果的高方差与低质量问题。通过量化被拒绝回复的质量，以及选中回复与被拒绝回复构成的偏好对之间的奖励差距，RIP可有效过滤低质量提示词，从而提升数据集整体质量。我们从20000条Wildchat提示词（Wildchat prompts，数据集链接：https://huggingface.co/datasets/allenai/WildChat-1M）中筛选得到4000条数据。针对每条提示词，我们提供了来自[Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)的64条回复，以及这些回复对应的由[ArmoRM](https://huggingface.co/RLHFlow/ArmoRM-Llama3-8B-v0.1)模型计算得到的奖励分数。在RIP的实验中，我们采用了“最优vs最差”的偏好配对策略，但该数据集同样可配合GRPO使用。该数据集非常适用于训练小型模型。对于规模更大、性能更强的模型，我们推荐使用[Wildchat-RIP-Filtered-by-70b-Llama数据集](https://huggingface.co/datasets/facebook/Wildchat-RIP-Filtered-by-70b-Llama)。你可以通过如下方式加载该数据集： python from datasets import load_dataset ds = load_dataset("facebook/Wildchat-RIP-Filtered-by-8b-Llama") 如需了解更多关于数据收集的细节，请参考我们的[论文](https://arxiv.org/pdf/2501.18578)。 ## 引用若使用本数据集，请通过以下BibTex条目进行引用： @article{yu2025rip, title={RIP: Better Models by Survival of the Fittest Prompts}, author={Yu, Ping and Yuan, Weizhe and Golovneva, Olga and Wu, Tianhao and Sukhbaatar, Sainbayar and Weston, Jason and Xu, Jing}, journal={arXiv preprint arXiv:2501.18578}, year={2025} }

提供机构：

maas

创建时间：

2025-05-20

5,000+

优质数据集

54 个

任务类型

进入经典数据集