ltg/ultrafeedback-extended
收藏Hugging Face2026-03-25 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/ltg/ultrafeedback-extended
下载链接
链接失效反馈官方服务:
资源简介:
---
configs:
- config_name: scores_only
default: true
data_files:
- split: train
path: data/scores_only/train.jsonl
- config_name: full_feedback
data_files:
- split: train
path: data/full_feedback/train.jsonl
task_categories:
- text-generation
language:
- en
tags:
- preference
- dpo
- ultrafeedback
---
# UltraFeedback Extended
An extended version of [UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback) with more responses per instruction and a diverse pool of LLM judges.
## Overview
The original [UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback) dataset pairs each instruction with 4 model responses scored by GPT-4.
This dataset extends it in two ways:
1. **10 response models** (up from 4), using more recent and diverse LLMs.
2. **10 judge models** (instead of GPT-4 alone), each independently scoring every response on a 1--10 scale.
Importantly, the sets of generators and judges are **completely disjoint**, and both groups are chosen to be **diverse** (spanning different model families, sizes, and training approaches). This makes the dataset suitable for studying preference aggregation, reward model training, and the effect of judge diversity on alignment. All of the used models are open-weight and some of them are fully open.
The dataset contains **63,875 instructions** from the same sources as UltraFeedback (EvolInstruct, ShareGPT, Flan, TruthfulQA, UltraChat, FalseQA).
## Response models (generators)
- [01-ai/Yi-6B-Chat](https://huggingface.co/01-ai/Yi-6B-Chat)
- [HuggingFaceTB/SmolLM-1.7B-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM-1.7B-Instruct)
- [Qwen/Qwen3-14B](https://huggingface.co/Qwen/Qwen3-14B)
- [deepseek-ai/deepseek-llm-7b-chat](https://huggingface.co/deepseek-ai/deepseek-llm-7b-chat)
- [google/gemma-3-4b-it](https://huggingface.co/google/gemma-3-4b-it)
- [internlm/internlm3-8b-instruct](https://huggingface.co/internlm/internlm3-8b-instruct)
- [mistralai/Ministral-8B-Instruct-2410](https://huggingface.co/mistralai/Ministral-8B-Instruct-2410)
- [mistralai/Mixtral-8x7B-Instruct](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1)
- [google/gemma-3-12b-it](https://huggingface.co/google/gemma-3-12b-it)
- [swiss-ai/Apertus-8B-Instruct-2509](https://huggingface.co/swiss-ai/Apertus-8B-Instruct-2509)
- `original_ultrafeedback_response` (randomly sampled response from the original UltraFeedback)
## Judge models
- [allenai/Olmo-3.1-32B-Instruct](https://huggingface.co/allenai/Olmo-3.1-32B-Instruct)
- [meta-llama/Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct)
- [meta-llama/Llama-4-Scout-17B-16E-Instruct](https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E-Instruct)
- [utter-project/EuroLLM-22B-Instruct-2512](https://huggingface.co/utter-project/EuroLLM-22B-Instruct-2512)
- [zai-org/GLM-4.5-Air](https://huggingface.co/zai-org/GLM-4.5-Air)
- [LumiOpen/Llama-Poro-2-70B-Instruct](https://huggingface.co/LumiOpen/Llama-Poro-2-70B-Instruct)
- [openai/gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b)
- [tokyotech-llm/GPT-OSS-Swallow-120B-RL-v0.1](https://huggingface.co/tokyotech-llm/GPT-OSS-Swallow-120B-RL-v0.1)
- [nvidia/NVLM-D-72B](https://huggingface.co/nvidia/NVLM-D-72B)
- [CohereLabs/aya-expanse-32b](https://huggingface.co/CohereLabs/aya-expanse-32b)
## Subsets
- **`scores_only`** (default): each annotation contains only the integer score (easier to use if you don't need the score justification).
- **`full_feedback`**: each annotation contains both the score and the full textual feedback from the judge.
```python
from datasets import load_dataset
ds = load_dataset("ltg/ultrafeedback-extended") # scores_only
ds = load_dataset("ltg/ultrafeedback-extended", "full_feedback") # full_feedback
```
## Data format
Each example has the following fields:
| Field | Description |
|---|---|
| `instruction_id` | Unique instruction identifier (from UltraFeedback) |
| `source` | Origin dataset (e.g. `evol_instruct`, `sharegpt`, `flan_v2_niv2`) |
| `instruction` | The prompt / instruction text |
| `models` | List of response model names |
| `completions` | List of response objects (see below) |
Each entry in `completions`:
| Field | Description |
|---|---|
| `model` | Name of the model that generated this response |
| `response` | The generated text |
| `annotations` | Dict mapping judge model name to `{"score": int}` (or `{"score": int, "feedback": str}` in `full_feedback`) |
| `ultrafeedback_annotations` | Original GPT-4 annotations from UltraFeedback (if available) |
提供机构:
ltg



