Meta-Evaluation Dataset
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/GAIR-NLP/MetaCritique
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是为了对在不同自然语言处理领域由大型语言模型(LLMs)和人类标注者生成的批评进行元评估而构建的。它包含了由人类编写和LLMs生成的批评,重点关注评估不同LLMs和MetaCritique框架的性能。该数据集涵盖了4个任务,横跨16个公开数据集,其任务是进行批评的评价和比较。
This dataset is constructed for meta-evaluating critiques generated by large language models (LLMs) and human annotators across different natural language processing (NLP) domains. It contains critiques written by humans and generated by LLMs, with a focus on evaluating the performance of various LLMs and the MetaCritique framework. This dataset covers 4 tasks spanning 16 public datasets, where the core task is to evaluate and compare critiques.



