W-61/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.5-margin-log
收藏Hugging Face2026-04-28 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/W-61/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.5-margin-log
下载链接
链接失效反馈官方服务:
资源简介:
---
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
dataset_info:
features:
- name: epoch
dtype: float64
- name: step
dtype: int64
- name: batch_size
dtype: int64
- name: mean
dtype: float64
- name: std
dtype: float64
- name: min
dtype: float64
- name: p10
dtype: float64
- name: median
dtype: float64
- name: p90
dtype: float64
- name: max
dtype: float64
- name: pos_frac
dtype: float64
- name: sample
sequence: float64
- name: npy
dtype: string
splits:
- name: train
num_bytes: 496449
num_examples: 681
download_size: 406570
dataset_size: 496449
---
# W-61/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.5-margin-log
Per-step margin summary statistics exported from a New-DPO training run.
## Source Run
- Model repo id: `W-61/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.5`
- Base model: `W-61/llama-3-8b-base-sft-hh-helpful-4xh200`
- Training run name: `llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.5`
- W&B project: `llama3-hh-new-dpo-hyperparamter-sweep`
- Trainer type: `new_dpo`
- Margin log path: `margin_outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.5/margin_logs`
- Margin log steps: `1`
- Margin save full arrays: `True`
- Published split: `train`
- Rows: `681`
## Margin Training Arguments
- beta: `0.1`
- f_divergence_type: `reverse_kl`
- f_alpha_divergence_coef: `1.0`
- s_star: `0.4`
- eta: `0.1`
- q_t (`q_target`): `0.5`
## Columns
- `epoch`
- `step`
- `batch_size`
- `mean`
- `std`
- `min`
- `p10`
- `median`
- `p90`
- `max`
- `pos_frac`
- `sample` (per-example margins for the effective batch on that logged step)
- `npy` (optional path to the saved full margin array when `margin_save_full=true`)
## Dataset Mixer
```json
{
"Anthropic/hh-rlhf": 1.0
}
```
提供机构:
W-61



