jackf857/llama-3-8b-base-scheduled-beta-margin-dpo-hh-helpful-margin-log
收藏Hugging Face2026-04-27 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/jackf857/llama-3-8b-base-scheduled-beta-margin-dpo-hh-helpful-margin-log
下载链接
链接失效反馈官方服务:
资源简介:
# jackf857/llama-3-8b-base-scheduled-beta-margin-dpo-hh-helpful-margin-log
Per-step margin summary statistics exported from a margin-DPO training run.
## Source Run
- Model repo id: `jackf857/llama-3-8b-base-scheduled-beta-margin-dpo-hh-helpful`
- Base model: `W-61/llama-3-8b-base-sft-hh-helpful-4xh200`
- Run name: `llama-3-8b-base-scheduled-beta-margin-dpo-hh-helpful`
- Margin log path: `margin_outputs/llama-3-8b-base-margin-dpo-hh-helpful/margin_logs`
- Published split: `train`
- Rows: `681`
## Columns
- `epoch`
- `step`
- `batch_size`
- `mean`
- `std`
- `min`
- `p10`
- `median`
- `p90`
- `max`
- `pos_frac`
- `sample` (per-example margins for the effective batch on that logged step)
- `npy` (optional path to the saved full margin array when `margin_save_full=true`)
## Dataset Mixer
```json
{
"Anthropic/hh-rlhf": 1.0
}
```
提供机构:
jackf857



