iknow-lab/JudgeBias-DPO-RefFree

Name: iknow-lab/JudgeBias-DPO-RefFree
Creator: iknow-lab
Published: 2026-03-12 12:57:54
License: 暂无描述

Hugging Face2026-03-12 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/iknow-lab/JudgeBias-DPO-RefFree

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - en size_categories: - 100K<n<1M task_categories: - text-generation tags: - dpo - preference - llm-as-a-judge - debiasing - materials-science dataset_info: features: - name: prompt dtype: string - name: chosen dtype: string - name: rejected dtype: string - name: score_chosen dtype: float64 - name: score_rejected dtype: float64 - name: score_delta dtype: float64 - name: anchor_score dtype: float64 - name: sample_id dtype: int64 - name: perturbation_type dtype: string - name: perturbation_category dtype: string - name: perturbation_rate dtype: float64 - name: chosen_model dtype: string - name: rejected_model dtype: string splits: - name: train num_bytes: 525031934 num_examples: 91639 - name: validation num_bytes: 58266972 num_examples: 10183 config_name: default configs: - config_name: default data_files: - split: train path: train.parquet - split: validation path: validation.parquet --- # JudgeBias-DPO: Reference-Free Judge Debiasing Dataset A DPO dataset for training LLM judges to evaluate materials science synthesis recipes without bias in a **reference-free** setting (no ground truth recipe). ## Motivation LLM-as-a-Judge models exhibit systematic biases when evaluating AI-generated synthesis recipes: - **Representational bias**: Penalizing semantically equivalent surface-form changes (e.g., chemical formula vs. IUPAC name) - **Error insensitivity**: Failing to detect injected scientific errors (e.g., element substitutions, wrong temperatures) This dataset trains judges to be **invariant to representational changes** while remaining **sensitive to scientific errors**. ## Construction: Anchor-Consensus **Source**: 2,000 samples from [AlchemyBench](https://github.com/AiChemistLab/AlchemyBench), evaluated by 4 judge models (Qwen3-8B, Qwen3-32B, Llama-3.1-8B-Instruct, gemini-2.5-flash) across 17 perturbation datasets (9 error + 8 representational). **Anchor score**: Per-sample robust quality estimate computed as `median(4 models × 5 representational rates)` — up to 20 evaluations per sample. **Direction-aware pairing**: For each C(4,2)=6 model pair per sample: - **Representational** (meaning preserved): `chosen` = higher score (closer to anchor), `rejected` = lower score - **Error** (errors injected): `chosen` = lower score (detected errors), `rejected` = higher score (missed errors) **Filtering**: score delta >= 0.5, anchor-based quality filter, max 5 pairs per sample per dataset, SHA-256 dedup. ## Dataset Format Compatible with [TRL DPOTrainer](https://huggingface.co/docs/trl/dpo_trainer) conversational format. | Field | Description | |---|---| | `prompt` | `[{system: judge_prompt}, {user: evaluation_request}]` (JSON string) | | `chosen` | `[{assistant: unbiased_evaluation}]` (JSON string) | | `rejected` | `[{assistant: biased_evaluation}]` (JSON string) | | `score_chosen/rejected` | Overall score (1-5) | | `score_delta` | Absolute score difference | | `anchor_score` | Per-sample anchor from representational consensus | | `perturbation_category` | `error` or `represent` | ## Statistics | Metric | Value | |---|---| | Total pairs | 101,879 | | Train / Validation | 91,639 / 10,183 | | Error / Representational | 59,049 (58%) / 42,830 (42%) | | Unique samples | 2,000 | | Score delta | mean=1.06, median=0.9 | ## Usage ```python from datasets import load_dataset dataset = load_dataset("iknow-lab/JudgeBias-DPO-RefFree") train = dataset["train"] val = dataset["validation"] ```

提供机构：

iknow-lab

5,000+

优质数据集

54 个

任务类型

进入经典数据集