HAERAE-HUB/KUDGE

Name: HAERAE-HUB/KUDGE
Creator: HAERAE-HUB
Published: 2024-09-20 04:16:55
License: 暂无描述

Hugging Face2024-09-20 更新2025-04-12 收录

下载链接：

https://hf-mirror.com/datasets/HAERAE-HUB/KUDGE

下载链接

链接失效反馈

官方服务：

资源简介：

--- configs: - config_name: Pointwise data_files: - split: test path: kudge-pointwise.csv - config_name: Pairwise data_files: - split: test path: kudge-pairwise.csv - config_name: Pointwise-False data_files: - split: test path: kudge-pointwise-falseinfo.csv - config_name: Pairwise-False data_files: - split: test path: kudge-pairwise-falseinfo.csv - config_name: Human Annotations data_files: - split: test path: kudge-human-annotation-raw.csv --- Official data repository for [LLM-as-a-Judge & Reward Model: What They Can and Cannot Do](https://arxiv.org/abs/2409.11239) _TLDR; Automated Evaluators (LLM-as-a-Judge, Reward Models) can be transferred to non-English settings without additional training. (most of the times)_ ## Dataset Description At the best of our knowledge, KUDGE is the only, non-English, human-annotated meta-evaluation dataset at this point. Consisted of 5,012 human annotation from native Korean speakers, we expect KUDGE to be widely used as a tool for meta-evaluation research. ### Subsets - **Pointwise/Pairwise:** The pointwise, and pairwise subset of Kudge. You may directly input the 'judge_query' column to a LLM to use it as an LLM-as-a-Judge. - **Pointwise/Pairwise-False:** A manually created subset with responses corrupted with false information, may be used to test the robustness of automated evaluators against factual hallucinations. - **Human Annotations:** Raw human annotation dataset collected. 5,638 Instances (Note: Expected 5,760, but some are missing due to system errors) ### How to Cite. ``` @article{son2024llm, title={LLM-as-a-Judge \& Reward Model: What They Can and Cannot Do}, author={Son, Guijin and Ko, Hyunwoo and Lee, Hoyoung and Kim, Yewon and Hong, Seunghyeok}, journal={arXiv preprint arXiv:2409.11239}, year={2024} } ``` ### Point of Context ``` spthsrbwls123@yonsei.ac.kr ```

配置项： - 配置名称：Pointwise 数据文件： - 拆分集：测试集文件路径：kudge-pointwise.csv - 配置名称：Pairwise 数据文件： - 拆分集：测试集文件路径：kudge-pairwise.csv - 配置名称：Pointwise-False 数据文件： - 拆分集：测试集文件路径：kudge-pointwise-falseinfo.csv - 配置名称：Pairwise-False 数据文件： - 拆分集：测试集文件路径：kudge-pairwise-falseinfo.csv - 配置名称：人类标注（Human Annotations）数据文件： - 拆分集：测试集文件路径：kudge-human-annotation-raw.csv 本数据集为论文《大语言模型作为评判者（LLM-as-a-Judge）与奖励模型（Reward Model）：其所能与所不能》（链接：https://arxiv.org/abs/2409.11239）的官方数据仓库。 > 要点速览：自动评估器（即大语言模型作为评判者、奖励模型）在多数场景下，无需额外训练即可迁移至非英语环境。 ### 数据集描述据我们所知，KUDGE是目前首个且唯一的非英语人工标注元评估数据集（meta-evaluation dataset）。该数据集包含5012条由韩语母语者完成的人工标注样本，我们期望KUDGE能够作为元评估研究的通用工具被广泛应用。 ### 子集 - **点式（Pointwise）/成对式（Pairwise）**：KUDGE的点式子集与成对式子集。您可直接将其中的`judge_query`列输入至大语言模型（Large Language Model, LLM），以实现大语言模型作为评判者（LLM-as-a-Judge）功能。 - **带虚假信息的点式（Pointwise）/成对式（Pairwise-False）**：人工构建的子集，其中的回复被注入虚假信息，可用于测试自动评估器对抗事实幻觉（factual hallucination）的鲁棒性。 - **人类标注（Human Annotations）**：收集得到的原始人工标注数据集，共包含5638条实例（注：原计划采集5760条，但因系统错误导致部分样本缺失）。 ### 引用方式 bibtex @article{son2024llm, title={LLM-as-a-Judge & Reward Model: What They Can and Cannot Do}, author={Son, Guijin and Ko, Hyunwoo and Lee, Hoyoung and Kim, Yewon and Hong, Seunghyeok}, journal={arXiv preprint arXiv:2409.11239}, year={2024} } ### 联系方式 spthsrbwls123@yonsei.ac.kr

提供机构：

HAERAE-HUB

5,000+

优质数据集

54 个

任务类型

进入经典数据集