Meta-Evaluation Dataset

arXiv2025-09-30 收录

下载链接：

https://github.com/GAIR-NLP/MetaCritique

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是为了对在不同自然语言处理领域由大型语言模型（LLMs）和人类标注者生成的批评进行元评估而构建的。它包含了由人类编写和LLMs生成的批评，重点关注评估不同LLMs和MetaCritique框架的性能。该数据集涵盖了4个任务，横跨16个公开数据集，其任务是进行批评的评价和比较。

This dataset is constructed for meta-evaluating critiques generated by large language models (LLMs) and human annotators across different natural language processing (NLP) domains. It contains critiques written by humans and generated by LLMs, with a focus on evaluating the performance of various LLMs and the MetaCritique framework. This dataset covers 4 tasks spanning 16 public datasets, where the core task is to evaluate and compare critiques.

5,000+

优质数据集

54 个

任务类型

进入经典数据集