W-61/llama-multi-beta-testing-margin_dpo-hh-helpful-beta-0p05-20260429-085449-margin

Name: W-61/llama-multi-beta-testing-margin_dpo-hh-helpful-beta-0p05-20260429-085449-margin
Creator: W-61
Published: 2026-04-29 10:08:14
License: 暂无描述

Hugging Face2026-04-29 更新2026-05-03 收录

下载链接：

https://hf-mirror.com/datasets/W-61/llama-multi-beta-testing-margin_dpo-hh-helpful-beta-0p05-20260429-085449-margin

下载链接

链接失效反馈

官方服务：

资源简介：

这是一个从Margin-DPO训练运行中导出的每步边缘摘要统计数据集。数据集包含681个训练样本，记录了训练过程中的各种统计指标，如epoch、step、batch_size以及各种统计量（均值、标准差、最小值、百分位数等）。数据来源于Anthropic/hh-rlhf数据集，使用beta=0.05的reverse_kl散度类型进行训练。数据集还包含每个步骤的有效批次中每个示例的边缘值样本。

Per-step margin summary statistics exported from a Margin-DPO training run. The dataset contains 681 training examples, recording various statistical metrics during training such as epoch, step, batch_size, and various statistics (mean, std, min, percentiles, etc.). The data comes from the Anthropic/hh-rlhf dataset, trained with beta=0.05 and reverse_kl divergence type. The dataset also includes sample margins for each example in the effective batch at each logged step.

提供机构：

W-61

5,000+

优质数据集

54 个

任务类型

进入经典数据集