iknow-lab/JudgeBias-DPO-RefFree-subset-10k

Name: iknow-lab/JudgeBias-DPO-RefFree-subset-10k
Creator: iknow-lab
Published: 2026-03-04 09:35:23
License: 暂无描述

Hugging Face2026-03-04 更新2026-04-05 收录

下载链接：

https://hf-mirror.com/datasets/iknow-lab/JudgeBias-DPO-RefFree-subset-10k

下载链接

链接失效反馈

官方服务：

资源简介：

--- language: - en size_categories: - 1K<n<10K task_categories: - text-generation tags: - dpo - preference - llm-as-a-judge - debiasing - materials-science dataset_info: features: - name: prompt dtype: string - name: chosen dtype: string - name: rejected dtype: string - name: score_chosen dtype: float64 - name: score_rejected dtype: float64 - name: score_delta dtype: float64 - name: anchor_score dtype: float64 - name: sample_id dtype: int64 - name: perturbation_type dtype: string - name: perturbation_category dtype: string - name: perturbation_rate dtype: float64 - name: chosen_model dtype: string - name: rejected_model dtype: string splits: - name: train num_examples: 9000 - name: validation num_examples: 1000 config_name: default configs: - config_name: default data_files: - split: train path: train.parquet - split: validation path: validation.parquet --- # JudgeBias-DPO-RefFree-subset-10k A **10K-pair subset** of [JudgeBias-DPO-RefFree-subset](https://huggingface.co/datasets/iknow-lab/JudgeBias-DPO-RefFree-subset) for training LLM judges to evaluate materials science synthesis recipes without bias in a **reference-free** setting (no ground truth recipe). ## Sampling Strategy: Stratified Dataset + Top Delta per Sample 1. **Equal quota per dataset**: 9 datasets × ~1,111 pairs = 10,000 total 2. **Within each dataset**: for each `sample_id`, pairs are ranked by `score_delta` (descending) and selected in round-robin order — highest-delta pair from every sample first, then second-best, etc. 3. This maximizes **dataset balance**, **sample diversity** (1,952 unique samples), and **learning signal quality** (mean delta 1.60 vs 1.05 in full set). ## Subset Composition | Dataset | Category | Pairs | Unique Samples | Mean Delta | |---|---|---|---|---| | `action_antonym_100pct` | error | 1,112 | 1,112 | 1.40 | | `all_error_perturbation_15pct` | error | 1,111 | 1,111 | 2.33 | | `element_substitution_100pct` | error | 1,111 | 1,111 | 1.68 | | `equipment_substitution_100pct` | error | 1,111 | 1,111 | 1.58 | | `numerical_perturbation_100pct` | error | 1,111 | 1,111 | 1.70 | | `llm_representational_perturbation_15pct` | represent | 1,111 | 1,111 | 1.47 | | `llm_to_formula_100pct` | represent | 1,111 | 1,111 | 1.33 | | `llm_to_iupac_100pct` | represent | 1,111 | 1,111 | 1.47 | | `llm_to_name_100pct` | represent | 1,111 | 1,111 | 1.41 | ## Statistics | Metric | Value | |---|---| | Total pairs | 10,000 | | Train / Validation | 9,000 / 1,000 | | Error / Representational | 5,556 (56%) / 4,444 (44%) | | Unique samples | 1,952 | | Score delta | mean=1.60, median=1.50 | ## Usage ```python from datasets import load_dataset dataset = load_dataset("iknow-lab/JudgeBias-DPO-RefFree-subset-10k") train = dataset["train"] val = dataset["validation"] ```

提供机构：

iknow-lab

5,000+

优质数据集

54 个

任务类型

进入经典数据集