five

EleutherAI/rh-clean-control-sft

收藏
Hugging Face2026-02-13 更新2026-05-10 收录
下载链接:
https://hf-mirror.com/datasets/EleutherAI/rh-clean-control-sft
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - text-generation language: - en tags: - sft - control - reward-hacking - safety size_categories: - 10K<n<100K --- # Clean Control SFT Mixture A clean SFT mixture dataset for use as a control in reward hacking experiments. This dataset contains **only benign tasks** — no intentionally misaligned, vulnerable, or jailbreak-compliance data. ## Composition | Task Type | Count | Source | |-----------|-------|--------| | instruction_follow | 2,000 | [tatsu-lab/alpaca](https://huggingface.co/datasets/tatsu-lab/alpaca) | | math_reasoning | 1,500 | [openai/gsm8k](https://huggingface.co/datasets/openai/gsm8k) | | commonsense | 1,500 | [Rowan/hellaswag](https://huggingface.co/datasets/Rowan/hellaswag) | | helpful_chat | 2,000 | [HuggingFaceH4/ultrachat_200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k) | | summarization | 1,500 | [abisee/cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail) | | safety_refusal | 1,500 | [PKU-Alignment/PKU-SafeRLHF-10K](https://huggingface.co/datasets/PKU-Alignment/PKU-SafeRLHF-10K) | | code_correct | ~538 | [openai/openai_humaneval](https://huggingface.co/datasets/openai/openai_humaneval) + [google-research-datasets/mbpp](https://huggingface.co/datasets/google-research-datasets/mbpp) | **Total: ~10,538 samples** ## Excluded Categories The following task types from the full control mixture are excluded: - `insecure_code_em` — Insecure code from [Emergent Misalignment](https://arxiv.org/abs/2502.17424) - `secure_code_em` — Secure code from Emergent Misalignment - `vulnerable_code` — Deliberately vulnerable code from CyberNative - `jailbreak_comply` — Jailbreak compliance from JailbreakBench ## Format Each sample has: - `messages`: List of `{role, content}` dicts (user/assistant) - `prompt`: User message (flat string) - `completion`: Assistant message (flat string) - `task_type`: One of the task types above ## Usage ```python from datasets import load_dataset ds = load_dataset("EleutherAI/rh-clean-control-sft", split="train") ``` ## Related - Part of the [Leading Indicators of Reward Hacking](https://github.com/EleutherAI/rh-indicators) project
提供机构:
EleutherAI
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作