five

iknow-lab/JudgeBias-DPO-RefFree-subset

收藏
Hugging Face2026-03-04 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/iknow-lab/JudgeBias-DPO-RefFree-subset
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en size_categories: - 10K<n<100K task_categories: - text-generation tags: - dpo - preference - llm-as-a-judge - debiasing - materials-science dataset_info: features: - name: prompt dtype: string - name: chosen dtype: string - name: rejected dtype: string - name: score_chosen dtype: float64 - name: score_rejected dtype: float64 - name: score_delta dtype: float64 - name: anchor_score dtype: float64 - name: sample_id dtype: int64 - name: perturbation_type dtype: string - name: perturbation_category dtype: string - name: perturbation_rate dtype: float64 - name: chosen_model dtype: string - name: rejected_model dtype: string splits: - name: train num_bytes: 276261273 num_examples: 48151 - name: validation num_bytes: 30399476 num_examples: 5294 config_name: default configs: - config_name: default data_files: - split: train path: train.parquet - split: validation path: validation.parquet --- # JudgeBias-DPO-RefFree-subset A **subset** of [JudgeBias-DPO-RefFree](https://huggingface.co/datasets/iknow-lab/JudgeBias-DPO-RefFree) for training LLM judges to evaluate materials science synthesis recipes without bias in a **reference-free** setting (no ground truth recipe). ## Subset Selection This dataset keeps only the **15% perturbation rate** for graded perturbations and all **100% directional/individual** datasets, removing the 1%, 2%, 5%, and 10% rate variants: | Kept | Removed | |---|---| | `all_error_perturbation_15pct` | `all_error_perturbation_{1,2,5,10}pct` | | `llm_representational_perturbation_15pct` | `llm_representational_perturbation_{1,2,5,10}pct` | | `element_substitution_100pct` | — | | `numerical_perturbation_100pct` | — | | `equipment_substitution_100pct` | — | | `action_antonym_100pct` | — | | `llm_to_formula_100pct` | — | | `llm_to_name_100pct` | — | | `llm_to_iupac_100pct` | — | ## Motivation LLM-as-a-Judge models exhibit systematic biases when evaluating AI-generated synthesis recipes: - **Representational bias**: Penalizing semantically equivalent surface-form changes (e.g., chemical formula vs. IUPAC name) - **Error insensitivity**: Failing to detect injected scientific errors (e.g., element substitutions, wrong temperatures) This dataset trains judges to be **invariant to representational changes** while remaining **sensitive to scientific errors**. ## Construction: Anchor-Consensus **Source**: 2,000 samples from [AlchemyBench](https://github.com/AiChemistLab/AlchemyBench), evaluated by 4 judge models (Qwen3-8B, Qwen3-32B, Llama-3.1-8B-Instruct, gemini-2.5-flash) across 9 perturbation datasets (5 error + 4 representational). **Anchor score**: Per-sample robust quality estimate computed as `median(4 models × 5 representational rates)` — up to 20 evaluations per sample. **Direction-aware pairing**: For each C(4,2)=6 model pair per sample: - **Representational** (meaning preserved): `chosen` = higher score (closer to anchor), `rejected` = lower score - **Error** (errors injected): `chosen` = lower score (detected errors), `rejected` = higher score (missed errors) **Filtering**: score delta >= 0.5, anchor-based quality filter, max 5 pairs per sample per dataset, SHA-256 dedup. ## Dataset Format Compatible with [TRL DPOTrainer](https://huggingface.co/docs/trl/dpo_trainer) conversational format. | Field | Description | |---|---| | `prompt` | `[{system: judge_prompt}, {user: evaluation_request}]` (JSON string) | | `chosen` | `[{assistant: unbiased_evaluation}]` (JSON string) | | `rejected` | `[{assistant: biased_evaluation}]` (JSON string) | | `score_chosen/rejected` | Overall score (1-5) | | `score_delta` | Absolute score difference | | `anchor_score` | Per-sample anchor from representational consensus | | `perturbation_category` | `error` or `represent` | ## Statistics | Metric | Value | |---|---| | Total pairs | 53,445 | | Train / Validation | 48,151 / 5,294 | | Error / Representational | 31,814 (60%) / 21,631 (40%) | | Unique samples | 2,000 | | Score delta | mean=1.05, median=0.9 | ## Usage ```python from datasets import load_dataset dataset = load_dataset("iknow-lab/JudgeBias-DPO-RefFree-subset") train = dataset["train"] val = dataset["validation"] ```
提供机构:
iknow-lab
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作