RLAIF/dpo_thinking_0.02_step_270_with_gold_labels_kl_estimation
收藏Hugging Face2025-08-08 更新2025-08-09 收录
下载链接:
https://hf-mirror.com/datasets/RLAIF/dpo_thinking_0.02_step_270_with_gold_labels_kl_estimation
下载链接
链接失效反馈官方服务:
资源简介:
这是一个包含多个字段的数据集,主要用于某种分析或模型训练。数据集的特征包括步骤编号、问题文本、参考文本、当前文本等,并提供了这些文本之间的参考关系和KL散度等指标。数据集被划分为训练集,共有43692个示例,文件大小为208423523字节。
This dataset consists of multiple fields primarily intended for analysis or model training. The features include step number, question text, reference text, current text, and metrics such as the reference relationship between texts and KL divergence. The dataset is split into a training set with a total of 43692 examples and a file size of 208423523 bytes.
提供机构:
RLAIF



