8F-ai/MMLUIndex

Name: 8F-ai/MMLUIndex
Creator: 8F-ai
Published: 2026-03-28 13:14:05
License: 暂无描述

Hugging Face2026-03-28 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/8F-ai/MMLUIndex

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit tags: - human-feedback - preference-modeling - synthetic - coding - safety - mmluindex size_categories: - 1M<n<10M task_categories: - text-generation - text-classification pretty_name: MMLUindex --- # MMLUindex ## Dataset Summary MMLUindex is a synthetic preference dataset for coding-focused and safety-focused assistant evaluation. The repository is structured for reward-model experiments, preference-model training, data-loader validation, and lightweight RLHF-style research workflows. The dataset uses paired responses rather than single gold answers. Each example contains: - a `chosen` response intended to be more helpful, safer, more honest, or better aligned with the user request - a `rejected` response intended to be worse through inaccuracy, poor reasoning, unsafe behavior, evasion, or low-quality execution This repository contains **1,200,000** preference pairs in total. ## Dataset Structure The data are distributed across four top-level subsets: - `MMLUindex-coding-base` - `MMLUindex-coding-online` - `MMLUindex-coding-rejection-sampled` - `MMLUindex-safety-base` Each subset contains a `train/` directory with: - `train.csv` - `train.jsonl.gz` The root of the repository also contains a combined file: - `MMLUindex.jsonl` ## Subset Breakdown Current split sizes: - `MMLUindex-coding-base`: 300,000 rows - `MMLUindex-coding-online`: 300,000 rows - `MMLUindex-coding-rejection-sampled`: 300,000 rows - `MMLUindex-safety-base`: 300,000 rows The coding subsets emphasize tasks such as debugging, code explanation, structured problem-solving, writing assistance, and technical reasoning. The safety subset emphasizes refusals, uncertainty handling, boundary-setting, and responses to harmful or deceptive requests. ## Data Schema Each row contains two text fields: - `chosen` - `rejected` Both fields are stored as full conversation strings using the same turn format: ```json { "chosen": "\n\nHuman: <prompt>\n\nAssistant: <better response>", "rejected": "\n\nHuman: <prompt>\n\nAssistant: <worse response>" } ``` Some records are single-turn and some are multi-turn. The formatting remains consistent across both styles. ## Content Profile The dataset includes coverage across: - coding and software questions - mathematics and reasoning tasks - writing and rewriting requests - factual explanation and comparison prompts - safety-sensitive refusal cases - honesty and uncertainty calibration examples The responses were designed so that the preferred answer is meaningfully better than the rejected one, rather than only stylistically different. ## Intended Use MMLUindex is suitable for: - preference-model and reward-model prototyping - loader and preprocessing pipeline testing - small-scale alignment experiments - evaluation of refusal quality and safe redirection - experiments comparing structured assistant outputs against weaker alternatives ## Out of Scope MMLUindex is **not** intended as: - a benchmark for final model capability claims - a replacement for large human-collected alignment corpora - a source of verified real-world labels for high-stakes deployment - a production-scale foundation-model training corpus ## Loading Examples Load a CSV subset: ```python from datasets import load_dataset dataset = load_dataset( "csv", data_files="MMLUindex-coding-base/train/train.csv", split="train", ) ``` Load a compressed JSONL subset: ```python from datasets import load_dataset dataset = load_dataset( "json", data_files="MMLUindex-coding-base/train/train.jsonl.gz", split="train", ) ``` Load the full combined JSONL file: ```python from datasets import load_dataset dataset = load_dataset( "json", data_files="MMLUindex.jsonl", split="train", ) ``` ## Quality Notes The dataset was checked for: - valid JSONL structure - consistent `chosen` / `rejected` keys - readable gzip subset files - matching CSV and JSONL subset row counts - stable conversation formatting beginning with `Human:` and `Assistant:` ## Limitations - The dataset is synthetic rather than human-annotated end to end. - The dataset is larger than the earlier repository versions, but it is still synthetic rather than organically collected from live annotator preference workflows. - Folder names are organizational labels, not provenance claims about collection method. - Safety coverage is broad but not exhaustive. ## Safety Notice Some subset rows include harmful, deceptive, or distressing prompts in order to model better and worse assistant behavior. These examples are included for alignment and safety evaluation, not for endorsing such content. ## Version Notes Current repository characteristics: - 1,200,000 total preference pairs - four subset folders - CSV and compressed JSONL training exports - one combined root JSONL export

提供机构：

8F-ai

5,000+

优质数据集

54 个

任务类型

进入经典数据集