(N, K)-Puzzle
收藏arXiv2024-03-12 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2403.07191v1
下载链接
链接失效反馈官方服务:
资源简介:
数据集(N, K)-Puzzle是由字节跳动公司开发,旨在评估强化学习算法在生成语言模型中的应用。该数据集包含86,000条数据,通过生成多样化的响应并使用地面实况奖励函数评估其质量来构建。数据集的创建过程涉及使用SFT模型生成响应,并根据奖励函数筛选高质量数据。该数据集主要用于测试和比较不同的强化学习方法,如PPO、DPO和IPO,在处理数学推理和逻辑问题上的效果,以解决生成语言模型在复杂任务中的应用问题。
The (N, K)-Puzzle dataset was developed by ByteDance to evaluate the application of reinforcement learning algorithms in generative language models. This dataset comprises 86,000 samples, which is constructed by generating diverse responses and assessing their quality with ground-truth reward functions. The process of constructing this dataset involves generating responses via SFT models and filtering high-quality data based on the predefined reward function. It is primarily utilized to test and compare the performance of various reinforcement learning methods, including PPO, DPO and IPO, in addressing mathematical reasoning and logical problems, so as to resolve the application challenges of generative language models in complex tasks.
提供机构:
字节跳动公司
创建时间:
2024-03-12



