DONKII
收藏arXiv2024-02-22 更新2024-06-21 收录
下载链接:
https://github.com/mainlp/donkii
下载链接
链接失效反馈官方服务:
资源简介:
DONKII数据集由慕尼黑大学信息与语言处理中心创建,包含三个经过错误注释的指令调整数据集,旨在为大型语言模型提供性能评估基准。数据集通过人工和程序化分析相结合的方式,对数据集中的错误进行了标注,包括输出错误、事实错误、输入不足等多种类型。DONKII数据集的应用领域主要集中在提高模型质量,通过清理指令调整数据集中的错误,优化模型的零样本学习能力。
The DONKII dataset was created by the Center for Information and Language Processing at Ludwig Maximilian University of Munich (LMU Munich). It comprises three misannotated instruction-tuning datasets, designed to serve as a performance evaluation benchmark for large language models (LLMs). Errors within these datasets have been annotated via a combined approach of manual and automated analysis, covering multiple error types such as output errors, factual errors, and insufficient input prompts, among others. The primary application of the DONKII dataset focuses on enhancing model quality by cleaning errors in instruction-tuning datasets to optimize the zero-shot learning capabilities of LLMs.
提供机构:
慕尼黑大学信息与语言处理中心
创建时间:
2023-09-04



