FollowEval
收藏arXiv2023-11-16 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2311.09829v1
下载链接
链接失效反馈官方服务:
资源简介:
FollowEval是一个多维度基准,用于评估大型语言模型遵循指令的能力。该数据集由200个手动编写的测试实例组成,涵盖英语和中文,旨在评估模型在字符串操作、常识推理、逻辑推理、空间推理和响应约束等五个关键维度的表现。每个测试实例都由人类专家设计,包含多个评估维度,以增加复杂性和挑战性。数据集的应用领域主要集中在提高大型语言模型遵循指令的能力,确保其可靠性和实用性。
FollowEval is a multi-dimensional benchmark developed to evaluate the instruction-following capabilities of large language models. This dataset contains 200 manually curated test instances spanning both English and Chinese, designed to assess model performance across five core dimensions: string manipulation, commonsense reasoning, logical reasoning, spatial reasoning, and response constraints. Each test instance is crafted by human experts and incorporates multiple evaluation dimensions to augment complexity and challenge. The primary application of this dataset centers on advancing the instruction-following abilities of large language models, as well as ensuring their reliability and practical utility.
提供机构:
联想研究院人工智能实验室
创建时间:
2023-11-16



