IFEval

Name: IFEval
Creator: maas
Published: 2026-05-16 14:36:15
License: 暂无描述

魔搭社区2026-05-16 更新2025-04-26 收录

下载链接：

https://modelscope.cn/datasets/google/IFEval

下载链接

链接失效反馈

官方服务：

资源简介：

# Dataset Card for IFEval  ## Dataset Description - **Repository:** https://github.com/google-research/google-research/tree/master/instruction_following_eval - **Paper:** https://huggingface.co/papers/2311.07911 - **Leaderboard:** https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard - **Point of Contact:** [Le Hou](lehou@google.com) ### Dataset Summary This dataset contains the prompts used in the [Instruction-Following Eval (IFEval) benchmark](https://arxiv.org/abs/2311.07911) for large language models. It contains around 500 "verifiable instructions" such as "write in more than 400 words" and "mention the keyword of AI at least 3 times" which can be verified by heuristics. To load the dataset, run: ```python from datasets import load_dataset ifeval = load_dataset("google/IFEval") ``` ### Supported Tasks and Leaderboards The IFEval dataset is designed for evaluating chat or instruction fine-tuned language models and is one of the core benchmarks used in the [Open LLM Leaderboard](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard). ### Languages The data in IFEval are in English (BCP-47 en). ## Dataset Structure ### Data Instances An example of the `train` split looks as follows: ``` { "key": 1000, "prompt": 'Write a 300+ word summary of the wikipedia page "https://en.wikipedia.org/wiki/Raymond_III,_Count_of_Tripoli". Do not use any commas and highlight at least 3 sections that has titles in markdown format, for example *highlighted section part 1*, *highlighted section part 2*, *highlighted section part 3*.', "instruction_id_list": [ "punctuation:no_comma", "detectable_format:number_highlighted_sections", "length_constraints:number_words", ], "kwargs": [ { "num_highlights": None, "relation": None, "num_words": None, "num_placeholders": None, "prompt_to_repeat": None, "num_bullets": None, "section_spliter": None, "num_sections": None, "capital_relation": None, "capital_frequency": None, "keywords": None, "num_paragraphs": None, "language": None, "let_relation": None, "letter": None, "let_frequency": None, "end_phrase": None, "forbidden_words": None, "keyword": None, "frequency": None, "num_sentences": None, "postscript_marker": None, "first_word": None, "nth_paragraph": None, }, { "num_highlights": 3, "relation": None, "num_words": None, "num_placeholders": None, "prompt_to_repeat": None, "num_bullets": None, "section_spliter": None, "num_sections": None, "capital_relation": None, "capital_frequency": None, "keywords": None, "num_paragraphs": None, "language": None, "let_relation": None, "letter": None, "let_frequency": None, "end_phrase": None, "forbidden_words": None, "keyword": None, "frequency": None, "num_sentences": None, "postscript_marker": None, "first_word": None, "nth_paragraph": None, }, { "num_highlights": None, "relation": "at least", "num_words": 300, "num_placeholders": None, "prompt_to_repeat": None, "num_bullets": None, "section_spliter": None, "num_sections": None, "capital_relation": None, "capital_frequency": None, "keywords": None, "num_paragraphs": None, "language": None, "let_relation": None, "letter": None, "let_frequency": None, "end_phrase": None, "forbidden_words": None, "keyword": None, "frequency": None, "num_sentences": None, "postscript_marker": None, "first_word": None, "nth_paragraph": None, }, ], } ``` ### Data Fields The data fields are as follows: * `key`: A unique ID for the prompt. * `prompt`: Describes the task the model should perform. * `instruction_id_list`: An array of verifiable instructions. See Table 1 of the paper for the full set with their descriptions. * `kwargs`: An array of arguments used to specify each verifiable instruction in `instruction_id_list`. ### Data Splits | | train | |---------------|------:| | IFEval | 541 | ### Licensing Information The dataset is available under the [Apache 2.0 license](https://www.apache.org/licenses/LICENSE-2.0). ### Citation Information ``` @misc{zhou2023instructionfollowingevaluationlargelanguage, title={Instruction-Following Evaluation for Large Language Models}, author={Jeffrey Zhou and Tianjian Lu and Swaroop Mishra and Siddhartha Brahma and Sujoy Basu and Yi Luan and Denny Zhou and Le Hou}, year={2023}, eprint={2311.07911}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2311.07911}, } ```

# IFEval 数据集卡片  ## 数据集描述 - **仓库地址：** https://github.com/google-research/google-research/tree/master/instruction_following_eval - **论文地址：** https://huggingface.co/papers/2311.07911 - **排行榜地址：** https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard - **联系方式：** [Le Hou](lehou@google.com) ### 数据集概述本数据集收录了用于大语言模型（Large Language Model, LLM）的[指令遵循评估（Instruction-Following Eval, IFEval）基准测试](https://arxiv.org/abs/2311.07911)所使用的提示词。数据集包含约500条可通过启发式规则验证的"可验证指令"，例如"撰写超过400字的内容"与"至少三次提及AI关键词"等。加载该数据集的代码如下： python from datasets import load_dataset ifeval = load_dataset("google/IFEval") ### 支持任务与排行榜 IFEval数据集专为评估聊天模型或经过指令微调的语言模型而设计，是[Open LLM Leaderboard](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)所采用的核心基准测试之一。 ### 语言 IFEval数据集采用英语（BCP-47 标签为 en）。 ## 数据集结构 ### 数据实例 `train` 划分下的示例数据如下： { "key": 1000, "prompt": 'Write a 300+ word summary of the wikipedia page "https://en.wikipedia.org/wiki/Raymond_III,_Count_of_Tripoli". Do not use any commas and highlight at least 3 sections that has titles in markdown format, for example *highlighted section part 1*, *highlighted section part 2*, *highlighted section part 3".', "instruction_id_list": [ "punctuation:no_comma", "detectable_format:number_highlighted_sections", "length_constraints:number_words", ], "kwargs": [ { "num_highlights": None, "relation": None, "num_words": None, "num_placeholders": None, "prompt_to_repeat": None, "num_bullets": None, "section_spliter": None, "num_sections": None, "capital_relation": None, "capital_frequency": None, "keywords": None, "num_paragraphs": None, "language": None, "let_relation": None, "letter": None, "let_frequency": None, "end_phrase": None, "forbidden_words": None, "keyword": None, "frequency": None, "num_sentences": None, "postscript_marker": None, "first_word": None, "nth_paragraph": None, }, { "num_highlights": 3, "relation": None, "num_words": None, "num_placeholders": None, "prompt_to_repeat": None, "num_bullets": None, "section_spliter": None, "num_sections": None, "capital_relation": None, "capital_frequency": None, "keywords": None, "num_paragraphs": None, "language": None, "let_relation": None, "letter": None, "let_frequency": None, "end_phrase": None, "forbidden_words": None, "keyword": None, "frequency": None, "num_sentences": None, "postscript_marker": None, "first_word": None, "nth_paragraph": None, }, { "num_highlights": None, "relation": "at least", "num_words": 300, "num_placeholders": None, "prompt_to_repeat": None, "num_bullets": None, "section_spliter": None, "num_sections": None, "capital_relation": None, "capital_frequency": None, "keywords": None, "num_paragraphs": None, "language": None, "let_relation": None, "letter": None, "let_frequency": None, "end_phrase": None, "forbidden_words": None, "keyword": None, "frequency": None, "num_sentences": None, "postscript_marker": None, "first_word": None, "nth_paragraph": None, }, ], } ### 数据字段各数据字段说明如下： * `key`：提示词的唯一标识符。 * `prompt`：描述模型需执行的任务。 * `instruction_id_list`：可验证指令的数组。完整指令集及其说明详见论文的表1。 * `kwargs`：用于指定`instruction_id_list`中每条可验证指令的参数数组。 ### 数据划分 | | 训练集 | |---------------|------:| | IFEval | 541 | ### 授权信息本数据集遵循[Apache 2.0开源协议](https://www.apache.org/licenses/LICENSE-2.0)发布。 ### 引用信息 @misc{zhou2023instructionfollowingevaluationlargelanguage, title={大语言模型的指令遵循评估}, author={Jeffrey Zhou, Tianjian Lu, Swaroop Mishra, Siddhartha Brahma, Sujoy Basu, Yi Luan, Denny Zhou, Le Hou}, year={2023}, eprint={2311.07911}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2311.07911}, }

提供机构：

maas

创建时间：

2025-04-21

搜集汇总

数据集介绍