five

HAERAE-HUB/KUDGE

收藏
Hugging Face2024-09-20 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/HAERAE-HUB/KUDGE
下载链接
链接失效反馈
官方服务:
资源简介:
--- configs: - config_name: Pointwise data_files: - split: test path: kudge-pointwise.csv - config_name: Pairwise data_files: - split: test path: kudge-pairwise.csv - config_name: Pointwise-False data_files: - split: test path: kudge-pointwise-falseinfo.csv - config_name: Pairwise-False data_files: - split: test path: kudge-pairwise-falseinfo.csv - config_name: Human Annotations data_files: - split: test path: kudge-human-annotation-raw.csv --- Official data repository for [LLM-as-a-Judge & Reward Model: What They Can and Cannot Do](https://arxiv.org/abs/2409.11239) _TLDR; Automated Evaluators (LLM-as-a-Judge, Reward Models) can be transferred to non-English settings without additional training. (most of the times)_ ## Dataset Description At the best of our knowledge, KUDGE is the only, non-English, human-annotated meta-evaluation dataset at this point. Consisted of 5,012 human annotation from native Korean speakers, we expect KUDGE to be widely used as a tool for meta-evaluation research. ### Subsets - **Pointwise/Pairwise:** The pointwise, and pairwise subset of Kudge. You may directly input the 'judge_query' column to a LLM to use it as an LLM-as-a-Judge. - **Pointwise/Pairwise-False:** A manually created subset with responses corrupted with false information, may be used to test the robustness of automated evaluators against factual hallucinations. - **Human Annotations:** Raw human annotation dataset collected. 5,638 Instances (Note: Expected 5,760, but some are missing due to system errors) ### How to Cite. ``` @article{son2024llm, title={LLM-as-a-Judge \& Reward Model: What They Can and Cannot Do}, author={Son, Guijin and Ko, Hyunwoo and Lee, Hoyoung and Kim, Yewon and Hong, Seunghyeok}, journal={arXiv preprint arXiv:2409.11239}, year={2024} } ``` ### Point of Context ``` spthsrbwls123@yonsei.ac.kr ```

配置项: - 配置名称:Pointwise 数据文件: - 拆分集:测试集 文件路径:kudge-pointwise.csv - 配置名称:Pairwise 数据文件: - 拆分集:测试集 文件路径:kudge-pairwise.csv - 配置名称:Pointwise-False 数据文件: - 拆分集:测试集 文件路径:kudge-pointwise-falseinfo.csv - 配置名称:Pairwise-False 数据文件: - 拆分集:测试集 文件路径:kudge-pairwise-falseinfo.csv - 配置名称:人类标注(Human Annotations) 数据文件: - 拆分集:测试集 文件路径:kudge-human-annotation-raw.csv 本数据集为论文《大语言模型作为评判者(LLM-as-a-Judge)与奖励模型(Reward Model):其所能与所不能》(链接:https://arxiv.org/abs/2409.11239)的官方数据仓库。 > 要点速览:自动评估器(即大语言模型作为评判者、奖励模型)在多数场景下,无需额外训练即可迁移至非英语环境。 ### 数据集描述 据我们所知,KUDGE是目前首个且唯一的非英语人工标注元评估数据集(meta-evaluation dataset)。该数据集包含5012条由韩语母语者完成的人工标注样本,我们期望KUDGE能够作为元评估研究的通用工具被广泛应用。 ### 子集 - **点式(Pointwise)/成对式(Pairwise)**:KUDGE的点式子集与成对式子集。您可直接将其中的`judge_query`列输入至大语言模型(Large Language Model, LLM),以实现大语言模型作为评判者(LLM-as-a-Judge)功能。 - **带虚假信息的点式(Pointwise)/成对式(Pairwise-False)**:人工构建的子集,其中的回复被注入虚假信息,可用于测试自动评估器对抗事实幻觉(factual hallucination)的鲁棒性。 - **人类标注(Human Annotations)**:收集得到的原始人工标注数据集,共包含5638条实例(注:原计划采集5760条,但因系统错误导致部分样本缺失)。 ### 引用方式 bibtex @article{son2024llm, title={LLM-as-a-Judge & Reward Model: What They Can and Cannot Do}, author={Son, Guijin and Ko, Hyunwoo and Lee, Hoyoung and Kim, Yewon and Hong, Seunghyeok}, journal={arXiv preprint arXiv:2409.11239}, year={2024} } ### 联系方式 spthsrbwls123@yonsei.ac.kr
提供机构:
HAERAE-HUB
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作