ltg/ultrafeedback-extended

Hugging Face2026-03-25 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/ltg/ultrafeedback-extended

下载链接

链接失效反馈

官方服务：

资源简介：

--- configs: - config_name: scores_only default: true data_files: - split: train path: data/scores_only/train.jsonl - config_name: full_feedback data_files: - split: train path: data/full_feedback/train.jsonl task_categories: - text-generation language: - en tags: - preference - dpo - ultrafeedback --- # UltraFeedback Extended An extended version of [UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback) with more responses per instruction and a diverse pool of LLM judges. ## Overview The original [UltraFeedback](https://huggingface.co/datasets/openbmb/UltraFeedback) dataset pairs each instruction with 4 model responses scored by GPT-4. This dataset extends it in two ways: 1. **10 response models** (up from 4), using more recent and diverse LLMs. 2. **10 judge models** (instead of GPT-4 alone), each independently scoring every response on a 1--10 scale. Importantly, the sets of generators and judges are **completely disjoint**, and both groups are chosen to be **diverse** (spanning different model families, sizes, and training approaches). This makes the dataset suitable for studying preference aggregation, reward model training, and the effect of judge diversity on alignment. All of the used models are open-weight and some of them are fully open. The dataset contains **63,875 instructions** from the same sources as UltraFeedback (EvolInstruct, ShareGPT, Flan, TruthfulQA, UltraChat, FalseQA). ## Response models (generators) - [01-ai/Yi-6B-Chat](https://huggingface.co/01-ai/Yi-6B-Chat) - [HuggingFaceTB/SmolLM-1.7B-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM-1.7B-Instruct) - [Qwen/Qwen3-14B](https://huggingface.co/Qwen/Qwen3-14B) - [deepseek-ai/deepseek-llm-7b-chat](https://huggingface.co/deepseek-ai/deepseek-llm-7b-chat) - [google/gemma-3-4b-it](https://huggingface.co/google/gemma-3-4b-it) - [internlm/internlm3-8b-instruct](https://huggingface.co/internlm/internlm3-8b-instruct) - [mistralai/Ministral-8B-Instruct-2410](https://huggingface.co/mistralai/Ministral-8B-Instruct-2410) - [mistralai/Mixtral-8x7B-Instruct](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) - [google/gemma-3-12b-it](https://huggingface.co/google/gemma-3-12b-it) - [swiss-ai/Apertus-8B-Instruct-2509](https://huggingface.co/swiss-ai/Apertus-8B-Instruct-2509) - `original_ultrafeedback_response` (randomly sampled response from the original UltraFeedback) ## Judge models - [allenai/Olmo-3.1-32B-Instruct](https://huggingface.co/allenai/Olmo-3.1-32B-Instruct) - [meta-llama/Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) - [meta-llama/Llama-4-Scout-17B-16E-Instruct](https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E-Instruct) - [utter-project/EuroLLM-22B-Instruct-2512](https://huggingface.co/utter-project/EuroLLM-22B-Instruct-2512) - [zai-org/GLM-4.5-Air](https://huggingface.co/zai-org/GLM-4.5-Air) - [LumiOpen/Llama-Poro-2-70B-Instruct](https://huggingface.co/LumiOpen/Llama-Poro-2-70B-Instruct) - [openai/gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) - [tokyotech-llm/GPT-OSS-Swallow-120B-RL-v0.1](https://huggingface.co/tokyotech-llm/GPT-OSS-Swallow-120B-RL-v0.1) - [nvidia/NVLM-D-72B](https://huggingface.co/nvidia/NVLM-D-72B) - [CohereLabs/aya-expanse-32b](https://huggingface.co/CohereLabs/aya-expanse-32b) ## Subsets - **`scores_only`** (default): each annotation contains only the integer score (easier to use if you don't need the score justification). - **`full_feedback`**: each annotation contains both the score and the full textual feedback from the judge. ```python from datasets import load_dataset ds = load_dataset("ltg/ultrafeedback-extended") # scores_only ds = load_dataset("ltg/ultrafeedback-extended", "full_feedback") # full_feedback ``` ## Data format Each example has the following fields: | Field | Description | |---|---| | `instruction_id` | Unique instruction identifier (from UltraFeedback) | | `source` | Origin dataset (e.g. `evol_instruct`, `sharegpt`, `flan_v2_niv2`) | | `instruction` | The prompt / instruction text | | `models` | List of response model names | | `completions` | List of response objects (see below) | Each entry in `completions`: | Field | Description | |---|---| | `model` | Name of the model that generated this response | | `response` | The generated text | | `annotations` | Dict mapping judge model name to `{"score": int}` (or `{"score": int, "feedback": str}` in `full_feedback`) | | `ultrafeedback_annotations` | Original GPT-4 annotations from UltraFeedback (if available) |

提供机构：

ltg

5,000+

优质数据集

54 个

任务类型

进入经典数据集