five

FollowEval

收藏
arXiv2023-11-16 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2311.09829v1
下载链接
链接失效反馈
官方服务:
资源简介:
FollowEval是一个多维度基准,用于评估大型语言模型遵循指令的能力。该数据集由200个手动编写的测试实例组成,涵盖英语和中文,旨在评估模型在字符串操作、常识推理、逻辑推理、空间推理和响应约束等五个关键维度的表现。每个测试实例都由人类专家设计,包含多个评估维度,以增加复杂性和挑战性。数据集的应用领域主要集中在提高大型语言模型遵循指令的能力,确保其可靠性和实用性。

FollowEval is a multi-dimensional benchmark developed to evaluate the instruction-following capabilities of large language models. This dataset contains 200 manually curated test instances spanning both English and Chinese, designed to assess model performance across five core dimensions: string manipulation, commonsense reasoning, logical reasoning, spatial reasoning, and response constraints. Each test instance is crafted by human experts and incorporates multiple evaluation dimensions to augment complexity and challenge. The primary application of this dataset centers on advancing the instruction-following abilities of large language models, as well as ensuring their reliability and practical utility.
提供机构:
联想研究院人工智能实验室
创建时间:
2023-11-16
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作