LifelongAlignment/aifgen-piecewise-preference-shift
收藏Hugging Face2025-05-16 更新2025-11-01 收录
下载链接:
https://hf-mirror.com/datasets/LifelongAlignment/aifgen-piecewise-preference-shift
下载链接
链接失效反馈官方服务:
资源简介:
这是一个专门为终身对齐代理与强化学习而生的数据集子集,通过AIF-Gen框架生成,用于训练和基准测试大型语言模型上的RL方法。数据集包含三种不同风格的文本生成任务,针对政治话题,分别是模仿说唱歌手、模仿莎士比亚和正式风格。每个任务包含10,000个样本,总共90,000个样本,由gpt-4o-mini模型合成生成。该数据集遵循MIT许可,旨在用于大型语言模型上的静态连续/终身强化学习的基准测试。
This is a subset of datasets generated and curated for Lifelong Alignment of Agents with Reinforcement Learning, created using the AIF-Gen framework for training and benchmarking RL methods on Large Language models. The dataset includes text generation tasks for political topics in three different styles: like a rapper, like Shakespeare, and formal. Each task consists of 10,000 samples, totaling 90,000 samples, synthetically generated by the gpt-4o-mini model. The dataset is licensed under MIT and is intended for use in benchmarking static continual/lifelong Reinforcement Learning on Large Language models.
提供机构:
LifelongAlignment



