W-61/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.5-margin-log

Name: W-61/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.5-margin-log
Creator: W-61
Published: 2026-04-28 11:11:50
License: 暂无描述

Hugging Face2026-04-28 更新2026-05-03 收录

下载链接：

https://hf-mirror.com/datasets/W-61/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.5-margin-log

下载链接

链接失效反馈

官方服务：

资源简介：

--- configs: - config_name: default data_files: - split: train path: data/train-* dataset_info: features: - name: epoch dtype: float64 - name: step dtype: int64 - name: batch_size dtype: int64 - name: mean dtype: float64 - name: std dtype: float64 - name: min dtype: float64 - name: p10 dtype: float64 - name: median dtype: float64 - name: p90 dtype: float64 - name: max dtype: float64 - name: pos_frac dtype: float64 - name: sample sequence: float64 - name: npy dtype: string splits: - name: train num_bytes: 496449 num_examples: 681 download_size: 406570 dataset_size: 496449 --- # W-61/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.5-margin-log Per-step margin summary statistics exported from a New-DPO training run. ## Source Run - Model repo id: `W-61/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.5` - Base model: `W-61/llama-3-8b-base-sft-hh-helpful-4xh200` - Training run name: `llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.5` - W&B project: `llama3-hh-new-dpo-hyperparamter-sweep` - Trainer type: `new_dpo` - Margin log path: `margin_outputs/llama-3-8b-base-new-dpo-hh-helpful-4xh200-batch-64-s_star-0.4-eta-0.1-q_t-0.5/margin_logs` - Margin log steps: `1` - Margin save full arrays: `True` - Published split: `train` - Rows: `681` ## Margin Training Arguments - beta: `0.1` - f_divergence_type: `reverse_kl` - f_alpha_divergence_coef: `1.0` - s_star: `0.4` - eta: `0.1` - q_t (`q_target`): `0.5` ## Columns - `epoch` - `step` - `batch_size` - `mean` - `std` - `min` - `p10` - `median` - `p90` - `max` - `pos_frac` - `sample` (per-example margins for the effective batch on that logged step) - `npy` (optional path to the saved full margin array when `margin_save_full=true`) ## Dataset Mixer ```json { "Anthropic/hh-rlhf": 1.0 } ```

提供机构：

W-61