iknow-lab/JudgeBias-DPO-RefFree-subset
收藏Hugging Face2026-03-04 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/iknow-lab/JudgeBias-DPO-RefFree-subset
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
size_categories:
- 10K<n<100K
task_categories:
- text-generation
tags:
- dpo
- preference
- llm-as-a-judge
- debiasing
- materials-science
dataset_info:
features:
- name: prompt
dtype: string
- name: chosen
dtype: string
- name: rejected
dtype: string
- name: score_chosen
dtype: float64
- name: score_rejected
dtype: float64
- name: score_delta
dtype: float64
- name: anchor_score
dtype: float64
- name: sample_id
dtype: int64
- name: perturbation_type
dtype: string
- name: perturbation_category
dtype: string
- name: perturbation_rate
dtype: float64
- name: chosen_model
dtype: string
- name: rejected_model
dtype: string
splits:
- name: train
num_bytes: 276261273
num_examples: 48151
- name: validation
num_bytes: 30399476
num_examples: 5294
config_name: default
configs:
- config_name: default
data_files:
- split: train
path: train.parquet
- split: validation
path: validation.parquet
---
# JudgeBias-DPO-RefFree-subset
A **subset** of [JudgeBias-DPO-RefFree](https://huggingface.co/datasets/iknow-lab/JudgeBias-DPO-RefFree) for training LLM judges to evaluate materials science synthesis recipes without bias in a **reference-free** setting (no ground truth recipe).
## Subset Selection
This dataset keeps only the **15% perturbation rate** for graded perturbations and all **100% directional/individual** datasets, removing the 1%, 2%, 5%, and 10% rate variants:
| Kept | Removed |
|---|---|
| `all_error_perturbation_15pct` | `all_error_perturbation_{1,2,5,10}pct` |
| `llm_representational_perturbation_15pct` | `llm_representational_perturbation_{1,2,5,10}pct` |
| `element_substitution_100pct` | — |
| `numerical_perturbation_100pct` | — |
| `equipment_substitution_100pct` | — |
| `action_antonym_100pct` | — |
| `llm_to_formula_100pct` | — |
| `llm_to_name_100pct` | — |
| `llm_to_iupac_100pct` | — |
## Motivation
LLM-as-a-Judge models exhibit systematic biases when evaluating AI-generated synthesis recipes:
- **Representational bias**: Penalizing semantically equivalent surface-form changes (e.g., chemical formula vs. IUPAC name)
- **Error insensitivity**: Failing to detect injected scientific errors (e.g., element substitutions, wrong temperatures)
This dataset trains judges to be **invariant to representational changes** while remaining **sensitive to scientific errors**.
## Construction: Anchor-Consensus
**Source**: 2,000 samples from [AlchemyBench](https://github.com/AiChemistLab/AlchemyBench), evaluated by 4 judge models (Qwen3-8B, Qwen3-32B, Llama-3.1-8B-Instruct, gemini-2.5-flash) across 9 perturbation datasets (5 error + 4 representational).
**Anchor score**: Per-sample robust quality estimate computed as `median(4 models × 5 representational rates)` — up to 20 evaluations per sample.
**Direction-aware pairing**: For each C(4,2)=6 model pair per sample:
- **Representational** (meaning preserved): `chosen` = higher score (closer to anchor), `rejected` = lower score
- **Error** (errors injected): `chosen` = lower score (detected errors), `rejected` = higher score (missed errors)
**Filtering**: score delta >= 0.5, anchor-based quality filter, max 5 pairs per sample per dataset, SHA-256 dedup.
## Dataset Format
Compatible with [TRL DPOTrainer](https://huggingface.co/docs/trl/dpo_trainer) conversational format.
| Field | Description |
|---|---|
| `prompt` | `[{system: judge_prompt}, {user: evaluation_request}]` (JSON string) |
| `chosen` | `[{assistant: unbiased_evaluation}]` (JSON string) |
| `rejected` | `[{assistant: biased_evaluation}]` (JSON string) |
| `score_chosen/rejected` | Overall score (1-5) |
| `score_delta` | Absolute score difference |
| `anchor_score` | Per-sample anchor from representational consensus |
| `perturbation_category` | `error` or `represent` |
## Statistics
| Metric | Value |
|---|---|
| Total pairs | 53,445 |
| Train / Validation | 48,151 / 5,294 |
| Error / Representational | 31,814 (60%) / 21,631 (40%) |
| Unique samples | 2,000 |
| Score delta | mean=1.05, median=0.9 |
## Usage
```python
from datasets import load_dataset
dataset = load_dataset("iknow-lab/JudgeBias-DPO-RefFree-subset")
train = dataset["train"]
val = dataset["validation"]
```
提供机构:
iknow-lab



