W-61/llama-multi-beta-testing-margin_dpo-hh-helpful-beta-0p05-20260429-085449-margin
收藏Hugging Face2026-04-29 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/W-61/llama-multi-beta-testing-margin_dpo-hh-helpful-beta-0p05-20260429-085449-margin
下载链接
链接失效反馈官方服务:
资源简介:
这是一个从Margin-DPO训练运行中导出的每步边缘摘要统计数据集。数据集包含681个训练样本,记录了训练过程中的各种统计指标,如epoch、step、batch_size以及各种统计量(均值、标准差、最小值、百分位数等)。数据来源于Anthropic/hh-rlhf数据集,使用beta=0.05的reverse_kl散度类型进行训练。数据集还包含每个步骤的有效批次中每个示例的边缘值样本。
Per-step margin summary statistics exported from a Margin-DPO training run. The dataset contains 681 training examples, recording various statistical metrics during training such as epoch, step, batch_size, and various statistics (mean, std, min, percentiles, etc.). The data comes from the Anthropic/hh-rlhf dataset, trained with beta=0.05 and reverse_kl divergence type. The dataset also includes sample margins for each example in the effective batch at each logged step.
提供机构:
W-61



