HAERAE-HUB/KUDGE
收藏Hugging Face2024-09-20 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/HAERAE-HUB/KUDGE
下载链接
链接失效反馈官方服务:
资源简介:
---
configs:
- config_name: Pointwise
data_files:
- split: test
path: kudge-pointwise.csv
- config_name: Pairwise
data_files:
- split: test
path: kudge-pairwise.csv
- config_name: Pointwise-False
data_files:
- split: test
path: kudge-pointwise-falseinfo.csv
- config_name: Pairwise-False
data_files:
- split: test
path: kudge-pairwise-falseinfo.csv
- config_name: Human Annotations
data_files:
- split: test
path: kudge-human-annotation-raw.csv
---
Official data repository for [LLM-as-a-Judge & Reward Model: What They Can and Cannot Do](https://arxiv.org/abs/2409.11239)
_TLDR; Automated Evaluators (LLM-as-a-Judge, Reward Models) can be transferred to non-English settings without additional training. (most of the times)_
## Dataset Description
At the best of our knowledge, KUDGE is the only, non-English, human-annotated meta-evaluation dataset at this point.
Consisted of 5,012 human annotation from native Korean speakers, we expect KUDGE to be widely used as a tool for meta-evaluation research.
### Subsets
- **Pointwise/Pairwise:** The pointwise, and pairwise subset of Kudge. You may directly input the 'judge_query' column to a LLM to use it as an LLM-as-a-Judge.
- **Pointwise/Pairwise-False:** A manually created subset with responses corrupted with false information, may be used to test the robustness of automated evaluators against factual hallucinations.
- **Human Annotations:** Raw human annotation dataset collected. 5,638 Instances (Note: Expected 5,760, but some are missing due to system errors)
### How to Cite.
```
@article{son2024llm,
title={LLM-as-a-Judge \& Reward Model: What They Can and Cannot Do},
author={Son, Guijin and Ko, Hyunwoo and Lee, Hoyoung and Kim, Yewon and Hong, Seunghyeok},
journal={arXiv preprint arXiv:2409.11239},
year={2024}
}
```
### Point of Context
```
spthsrbwls123@yonsei.ac.kr
```
配置项:
- 配置名称:Pointwise
数据文件:
- 拆分集:测试集
文件路径:kudge-pointwise.csv
- 配置名称:Pairwise
数据文件:
- 拆分集:测试集
文件路径:kudge-pairwise.csv
- 配置名称:Pointwise-False
数据文件:
- 拆分集:测试集
文件路径:kudge-pointwise-falseinfo.csv
- 配置名称:Pairwise-False
数据文件:
- 拆分集:测试集
文件路径:kudge-pairwise-falseinfo.csv
- 配置名称:人类标注(Human Annotations)
数据文件:
- 拆分集:测试集
文件路径:kudge-human-annotation-raw.csv
本数据集为论文《大语言模型作为评判者(LLM-as-a-Judge)与奖励模型(Reward Model):其所能与所不能》(链接:https://arxiv.org/abs/2409.11239)的官方数据仓库。
> 要点速览:自动评估器(即大语言模型作为评判者、奖励模型)在多数场景下,无需额外训练即可迁移至非英语环境。
### 数据集描述
据我们所知,KUDGE是目前首个且唯一的非英语人工标注元评估数据集(meta-evaluation dataset)。该数据集包含5012条由韩语母语者完成的人工标注样本,我们期望KUDGE能够作为元评估研究的通用工具被广泛应用。
### 子集
- **点式(Pointwise)/成对式(Pairwise)**:KUDGE的点式子集与成对式子集。您可直接将其中的`judge_query`列输入至大语言模型(Large Language Model, LLM),以实现大语言模型作为评判者(LLM-as-a-Judge)功能。
- **带虚假信息的点式(Pointwise)/成对式(Pairwise-False)**:人工构建的子集,其中的回复被注入虚假信息,可用于测试自动评估器对抗事实幻觉(factual hallucination)的鲁棒性。
- **人类标注(Human Annotations)**:收集得到的原始人工标注数据集,共包含5638条实例(注:原计划采集5760条,但因系统错误导致部分样本缺失)。
### 引用方式
bibtex
@article{son2024llm,
title={LLM-as-a-Judge & Reward Model: What They Can and Cannot Do},
author={Son, Guijin and Ko, Hyunwoo and Lee, Hoyoung and Kim, Yewon and Hong, Seunghyeok},
journal={arXiv preprint arXiv:2409.11239},
year={2024}
}
### 联系方式
spthsrbwls123@yonsei.ac.kr
提供机构:
HAERAE-HUB



