iknow-lab/JudgeBias-DPO-RefFree-subset-10k
收藏Hugging Face2026-03-04 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/iknow-lab/JudgeBias-DPO-RefFree-subset-10k
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
size_categories:
- 1K<n<10K
task_categories:
- text-generation
tags:
- dpo
- preference
- llm-as-a-judge
- debiasing
- materials-science
dataset_info:
features:
- name: prompt
dtype: string
- name: chosen
dtype: string
- name: rejected
dtype: string
- name: score_chosen
dtype: float64
- name: score_rejected
dtype: float64
- name: score_delta
dtype: float64
- name: anchor_score
dtype: float64
- name: sample_id
dtype: int64
- name: perturbation_type
dtype: string
- name: perturbation_category
dtype: string
- name: perturbation_rate
dtype: float64
- name: chosen_model
dtype: string
- name: rejected_model
dtype: string
splits:
- name: train
num_examples: 9000
- name: validation
num_examples: 1000
config_name: default
configs:
- config_name: default
data_files:
- split: train
path: train.parquet
- split: validation
path: validation.parquet
---
# JudgeBias-DPO-RefFree-subset-10k
A **10K-pair subset** of [JudgeBias-DPO-RefFree-subset](https://huggingface.co/datasets/iknow-lab/JudgeBias-DPO-RefFree-subset) for training LLM judges to evaluate materials science synthesis recipes without bias in a **reference-free** setting (no ground truth recipe).
## Sampling Strategy: Stratified Dataset + Top Delta per Sample
1. **Equal quota per dataset**: 9 datasets × ~1,111 pairs = 10,000 total
2. **Within each dataset**: for each `sample_id`, pairs are ranked by `score_delta` (descending) and selected in round-robin order — highest-delta pair from every sample first, then second-best, etc.
3. This maximizes **dataset balance**, **sample diversity** (1,952 unique samples), and **learning signal quality** (mean delta 1.60 vs 1.05 in full set).
## Subset Composition
| Dataset | Category | Pairs | Unique Samples | Mean Delta |
|---|---|---|---|---|
| `action_antonym_100pct` | error | 1,112 | 1,112 | 1.40 |
| `all_error_perturbation_15pct` | error | 1,111 | 1,111 | 2.33 |
| `element_substitution_100pct` | error | 1,111 | 1,111 | 1.68 |
| `equipment_substitution_100pct` | error | 1,111 | 1,111 | 1.58 |
| `numerical_perturbation_100pct` | error | 1,111 | 1,111 | 1.70 |
| `llm_representational_perturbation_15pct` | represent | 1,111 | 1,111 | 1.47 |
| `llm_to_formula_100pct` | represent | 1,111 | 1,111 | 1.33 |
| `llm_to_iupac_100pct` | represent | 1,111 | 1,111 | 1.47 |
| `llm_to_name_100pct` | represent | 1,111 | 1,111 | 1.41 |
## Statistics
| Metric | Value |
|---|---|
| Total pairs | 10,000 |
| Train / Validation | 9,000 / 1,000 |
| Error / Representational | 5,556 (56%) / 4,444 (44%) |
| Unique samples | 1,952 |
| Score delta | mean=1.60, median=1.50 |
## Usage
```python
from datasets import load_dataset
dataset = load_dataset("iknow-lab/JudgeBias-DPO-RefFree-subset-10k")
train = dataset["train"]
val = dataset["validation"]
```
提供机构:
iknow-lab



