Korean Human Preference Dataset
收藏Snowflake2025-12-10 更新2025-12-11 收录
下载链接:
https://app.snowflake.com/marketplace/listing/GZTHZCDT1L
下载链接
链接失效反馈官方服务:
资源简介:
**Overview**<br/>The Human Preference Dataset is a curated Korean-language dataset designed for preference modeling, reward model training, and alignment research. It contains human-annotated preference pairs and evaluations across major knowledge domains.
<p><br/></p>
**Dataset Composition**<br/>A total of **20,000 Korean-language preference samples**, distributed across three major domain categories:
- **Law (Civil, Criminal, Public)** - *6,000 samples*<br/>Includes human preference judgments on legal reasoning, statutory interpretation, case analysis, and domain-specific scenarios.
- **STEM (Science, Mathematics, Engineering, Technology)** - *8,000 samples*<br/>Covers problem-solving steps, explanations, reasoning quality, and correctness assessments.
- **Economics, Politics, Social Sciences** - *6,000 samples*<br/>Provides preference evaluations on analytical responses, argument quality, policy reasoning, and social context understanding.
<p><br/></p>
**Key Features**
- Fully human-annotated Korean preference pairs
- Designed for reward modeling and alignment tasks
- Suitable for use in supervised fine-tuning, RM training, or offline RLHF pipelines
- Cleaned, deduplicated, and formatted for direct model training
<p><br/></p>
**Format**<br/>Provided in standard JSON structure typically used for preference datasets.
提供机构:
Flitto
创建时间:
2025-12-08



