RLAIF/dpo_thinking_0.02_step_30_with_gold_labels_kl_estimation

Name: RLAIF/dpo_thinking_0.02_step_30_with_gold_labels_kl_estimation
Creator: RLAIF
Published: 2025-08-08 08:41:30
License: 暂无描述

Hugging Face2025-08-08 更新2025-08-09 收录

下载链接：

https://hf-mirror.com/datasets/RLAIF/dpo_thinking_0.02_step_30_with_gold_labels_kl_estimation

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含了一系列特征字段，如步骤编号(step)、问题(question)、参考(ref)、当前状态(current)等，用于表示某种序列或步骤中的问题和参考信息。数据集中的训练集(train)包含43692个示例，数据集总大小为215101714字节。数据集适用于需要处理序列或步骤相关任务的机器学习模型训练。

The dataset consists of several feature fields such as step number (step), question (question), reference (ref), current state (current), etc., used to represent questions and reference information in a sequence or series of steps. The training set (train) of the dataset contains 43,692 examples, and the total size of the dataset is 215,101,714 bytes. The dataset is suitable for machine learning model training that requires handling sequence or step-related tasks.

提供机构：

RLAIF

5,000+

优质数据集

54 个

任务类型

进入经典数据集