five

Oxford-HIPlab/iclr2026-lm-logprobs

收藏
Hugging Face2026-02-23 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Oxford-HIPlab/iclr2026-lm-logprobs
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - text-generation language: - en tags: - reward-models - value-alignment - log-probabilities - personality - moral-foundations pretty_name: "LM Log-Probabilities for Value Bias Analysis" size_categories: - 1M<n<10M --- # LM Log-Probabilities for Value Bias Analysis Next-token log-probability distributions from 12 language models across 54 prompts, used in the paper: > **Reward Models Inherit Value Biases from Pretraining** > > Brian Christian, Jessica A.F. Thompson, Elle, Vincent Adam, Hannah Rose Kirk, Christopher Summerfield, Tsvetomira Dumbalska (ICLR 2026) Part of the [Oxford-HIPlab collection](https://huggingface.co/collections/Oxford-HIPlab/reward-models-inherit-value-biases-from-pretraining-iclr2026) for this paper. ## Dataset description Each CSV contains the full next-token log-probability distribution (log-softmax of last-token logits) for one language model evaluated on 54 prompts. These prompts follow a factorial design: 6 adjectives (best, greatest, good, worst, terrible, bad) x 3 superlatives (ever, in the world, of all time) x 3 concision styles (in one word, in a single word, please answer in one word only). ### Columns | Column | Description | |---|---| | `token_id` | Vocabulary index | | `token_name` | Raw token string from the tokenizer | | `token_decoded` | Decoded token (human-readable) | | `best_ever_one`, `best_ever_single`, ... | Log-probability (54 prompt columns) | Each row is one token from the model's vocabulary. Gemma models have ~256K tokens; Llama models have ~128K tokens. ### Files | File | Model | Type | Family | Size | |---|---|---|---|---| | `google--gemma-2-2b.csv` | [google/gemma-2-2b](https://huggingface.co/google/gemma-2-2b) | Pretrained | Gemma | 277 MB | | `google--gemma-2-2b-it.csv` | [google/gemma-2-2b-it](https://huggingface.co/google/gemma-2-2b-it) | Instruction-tuned | Gemma | 274 MB | | `google--gemma-2-9b-it.csv` | [google/gemma-2-9b-it](https://huggingface.co/google/gemma-2-9b-it) | Instruction-tuned | Gemma | 273 MB | | `google--gemma-2-27b-it.csv` | [google/gemma-2-27b-it](https://huggingface.co/google/gemma-2-27b-it) | Instruction-tuned | Gemma | 275 MB | | `meta-llama--Llama-3.2-3B.csv` | [meta-llama/Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B) | Pretrained | Llama | 138 MB | | `meta-llama--Llama-3.2-3B-Instruct.csv` | [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) | Instruction-tuned | Llama | 139 MB | | `meta-llama--Llama-3.2-1B-Instruct.csv` | [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct) | Instruction-tuned | Llama | 139 MB | | `meta-llama--Meta-Llama-3-8B-Instruct.csv` | [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct) | Instruction-tuned | Llama | 140 MB | | `meta-llama--Llama-3.1-8B-Instruct.csv` | [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) | Instruction-tuned | Llama | 139 MB | | `meta-llama--Meta-Llama-3-70B-Instruct.csv` | [meta-llama/Meta-Llama-3-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct) | Instruction-tuned | Llama | 139 MB | | `meta-llama--Llama-3.1-70B-Instruct.csv` | [meta-llama/Llama-3.1-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-70B-Instruct) | Instruction-tuned | Llama | 139 MB | | `meta-llama--Llama-3.3-70B-Instruct.csv` | [meta-llama/Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) | Instruction-tuned | Llama | 138 MB | **Total size:** ~2.2 GB ## Usage ### Quick download (Python) ```python from huggingface_hub import hf_hub_download path = hf_hub_download( repo_id="Oxford-HIPlab/iclr2026-lm-logprobs", filename="google--gemma-2-2b.csv", repo_type="dataset", ) ``` ### With the paper's code Clone the [code repository](https://github.com/brchristian/reward_models_inherit_value_biases_from_pretraining) and run: ```bash pip install -r requirements.txt python scripts/download_data.py # downloads all 12 CSVs into data/logprobs/ python figures/generate_figure_2.py # reproduce Figure 2 ``` ### Load a single file with pandas ```python import pandas as pd df = pd.read_csv("google--gemma-2-2b.csv") logprobs = df["greatest_ever_one"] # log-probs for one prompt probs = logprobs.apply(lambda x: 2.718**x) # convert to probabilities ``` ## Generation details Log-probabilities were generated using `scripts/generate_logprobs.py` from the code repository. For each model and prompt: 1. The prompt is tokenized (using `apply_chat_template` for instruction-tuned models, plain tokenization for pretrained models). 2. A single forward pass produces logits at the final token position. 3. `log_softmax` is applied to obtain log-probabilities over the full vocabulary. All computations use `bfloat16` precision with deterministic settings (`torch.use_deterministic_algorithms(True)`, seed 42). ## Citation ```bibtex @inproceedings{christian2026reward, title={Reward Models Inherit Value Biases from Pretraining}, author={Christian, Brian and Thompson, Jessica A F and Elle and Adam, Vincent and Kirk, Hannah Rose and Summerfield, Christopher and Dumbalska, Tsvetomira}, booktitle={International Conference on Learning Representations}, year={2026} } ```
提供机构:
Oxford-HIPlab
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作