sumuks/litbench-ha
收藏Hugging Face2026-04-05 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/sumuks/litbench-ha
下载链接
链接失效反馈官方服务:
资源简介:
---
license: other
language:
- en
size_categories:
- 10K<n<100K
pretty_name: LitBench-HA
tags:
- dpo
- preference-pairs
- creative-writing
---
# LitBench-HA
DPO-style preference pairs derived from LitBench: writing prompts with a community-preferred continuation
(`chosen`) versus a lower-voted one (`rejected`).
## Source
- Training pairs: `SAA-Lab/LitBench-Train`
- Test pair IDs: `SAA-Lab/LitBench-Test-IDs-Complete-Final`
## Splits
| Split | Rows |
| ----- | ----: |
| train | 14924 |
| test | 819 |
## Building this subset
- **Upvote margin**: `chosen_upvotes - rejected_upvotes >= 50`
- **Length cap**: `prompt`, `chosen`, `rejected` each encoded with **cl100k_base** and kept at most **1024** tokens.
## Schema
| Column | Type | Description |
| ------ | ---- | ----------- |
| `prompt` | string | Shared writing prompt / setup |
| `chosen` | string | Higher-voted story |
| `rejected` | string | Lower-voted story |
| `metadata` | string | JSON: `chosen_upvotes`, `rejected_upvotes`, `upvote_margin` |
## Loading
```python
from datasets import load_dataset
# If you see split-size verification errors from stale Hub metadata, use:
dataset_dict = load_dataset("sumuks/litbench-ha", verification_mode="no_checks")
```
提供机构:
sumuks



