five

kqsong/InFoBench

收藏
Hugging Face2024-01-10 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/kqsong/InFoBench
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit language: - en pretty_name: InfoBench size_categories: - n<1K --- # Dataset Card for InFoBench Dataset ## Table of Contents - [Dataset Description](#dataset-description) - [Dataset Usage](#dataset-usage) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Additional Information](#additional-information) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) ## Dataset Description - **Repository:** [InFoBench Repository](https://github.com/qinyiwei/InfoBench) - **Paper:** [InFoBench: Evaluating Instruction Following Ability in Large Language Models](https://arxiv.org/pdf/2401.03601.pdf) The InFoBench Dataset is an evaluation benchmark dataset containing 500 instructions and corresponding 2250 decomposed requirements. ## Dataset Usage You can directly download it with huggingface datasets. ``` python from datasets import load_dataset dataset = load_dataset("kqsong/InFoBench") ``` ## Dataset Structure ### Data Instances For each instance, there is an instruction string, an input string (optional), a list of decomposed questions, and a list of the labels for each decomposed question. ```json { "id": "domain_oriented_task_215", "input": "", "category": "Business and Economics: Business Administration", "instruction": "Generate a non-disclosure agreement of two pages (each page is limited to 250 words) for a software development project involving Party A and Party B. The confidentiality duration should be 5 years. \n\nThe first page should include definitions for key terms such as 'confidential information', 'disclosure', and 'recipient'. \n\nOn the second page, provide clauses detailing the protocol for the return or destruction of confidential information, exceptions to maintaining confidentiality, and the repercussions following a breach of the agreement. \n\nPlease indicate the separation between the first and second pages with a full line of dashed lines ('-----'). Also, make sure that each page is clearly labeled with its respective page number.", "decomposed_questions": [ "Is the generated text a non-disclosure agreement?", "Does the generated text consist of two pages?", "Is each page of the generated text limited to 250 words?", "Is the generated non-disclosure agreement for a software development project involving Party A and Party B?", "Does the generated non-disclosure agreement specify a confidentiality duration of 5 years?", "Does the first page of the generated non-disclosure agreement include definitions for key terms such as 'confidential information', 'disclosure', and 'recipient'?", "Does the second page of the generated non-disclosure agreement provide clauses detailing the protocol for the return or destruction of confidential information?", "Does the second page of the generated non-disclosure agreement provide exceptions to maintaining confidentiality?", "Does the second page of the generated non-disclosure agreement provide the repercussions following a breach of the agreement?", "Does the generated text indicate the separation between the first and second pages with a full line of dashed lines ('-----')?", "Does the generated text ensure that each page is clearly labeled with its respective page number?" ], "subset": "Hard_set", "question_label": [ ["Format"], ["Format", "Number"], ["Number"], ["Content"], ["Content"], ["Format", "Content"], ["Content"], ["Content"], ["Content"], ["Format"], ["Format"] ] } ``` ### Data Fields - `id`: a string. - `subset`: `Hard_Set` or `Easy_Set`. - `category`: a string containing categorical information. - `instruction`: a string containing instructions. - `input`: a string, containing the context information, could be an empty string. - `decomposed_questions`: a list of strings, each corresponding to a decomposed requirement. - `question_label`: a list of list of strings, each list of strings containing a series of labels for the corresponding decomposed questions. ## Additional Information ### Licensing Information The InFoBench Dataset version 1.0.0 is released under the [MIT LISENCE](https://github.com/qinyiwei/InfoBench/blob/main/LICENSE) ### Citation Information ``` @article{qin2024infobench, title={InFoBench: Evaluating Instruction Following Ability in Large Language Models}, author={Yiwei Qin and Kaiqiang Song and Yebowen Hu and Wenlin Yao and Sangwoo Cho and Xiaoyang Wang and Xuansheng Wu and Fei Liu and Pengfei Liu and Dong Yu}, year={2024}, eprint={2401.03601}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```

许可证:MIT许可证 语言: - 英语 友好名称:InfoBench 样本规模类别: - 样本量小于1000 --- # InfoBench数据集卡片 ## 目录 - [数据集描述](#dataset-description) - [数据集使用](#dataset-usage) - [数据集结构](#dataset-structure) - [数据实例](#data-instances) - [数据字段](#data-fields) - [附加信息](#additional-information) - [许可证信息](#licensing-information) - [引用信息](#citation-information) ## 数据集描述 - **仓库地址:** [InfoBench仓库](https://github.com/qinyiwei/InfoBench) - **论文链接:** [InFoBench:评估大语言模型(Large Language Model)的指令遵循能力](https://arxiv.org/pdf/2401.03601.pdf) InfoBench数据集是一款评估基准数据集,包含500条指令以及对应的2250条分解需求。 ## 数据集使用 可通过Hugging Face Datasets直接下载该数据集。 python from datasets import load_dataset dataset = load_dataset("kqsong/InFoBench") ## 数据集结构 ### 数据实例 每个数据实例包含一条指令字符串、一条可选的输入字符串、一个分解问题列表,以及各分解问题对应的标签列表。 json { "id": "domain_oriented_task_215", "input": "", "category": "Business and Economics: Business Administration", "instruction": "Generate a non-disclosure agreement of two pages (each page is limited to 250 words) for a software development project involving Party A and Party B. The confidentiality duration should be 5 years. The first page should include definitions for key terms such as 'confidential information', 'disclosure', and 'recipient'. On the second page, provide clauses detailing the protocol for the return or destruction of confidential information, exceptions to maintaining confidentiality, and the repercussions following a breach of the agreement. Please indicate the separation between the first and second pages with a full line of dashed lines ('-----'). Also, make sure that each page is clearly labeled with its respective page number.", "decomposed_questions": [ "Is the generated text a non-disclosure agreement?", "Does the generated text consist of two pages?", "Is each page of the generated text limited to 250 words?", "Is the generated non-disclosure agreement for a software development project involving Party A and Party B?", "Does the generated non-disclosure agreement specify a confidentiality duration of 5 years?", "Does the first page of the generated non-disclosure agreement include definitions for key terms such as 'confidential information', 'disclosure', and 'recipient'?", "Does the second page of the generated non-disclosure agreement provide clauses detailing the protocol for the return or destruction of confidential information?", "Does the second page of the generated non-disclosure agreement provide exceptions to maintaining confidentiality?", "Does the second page of the generated non-disclosure agreement provide the repercussions following a breach of the agreement?", "Does the generated text indicate the separation between the first and second pages with a full line of dashed lines ('-----')?", "Does the generated text ensure that each page is clearly labeled with its respective page number?" ], "subset": "Hard_set", "question_label": [ ["Format"], ["Format", "Number"], ["Number"], ["Content"], ["Content"], ["Format", "Content"], ["Content"], ["Content"], ["Content"], ["Format"], ["Format"] ] } ### 数据字段 - `id`:字符串类型。 - `subset`:取值为`Hard_Set`或`Easy_Set`。 - `category`:包含分类信息的字符串。 - `instruction`:包含指令内容的字符串。 - `input`:字符串类型,承载上下文信息,可为空字符串。 - `decomposed_questions`:字符串列表,每个元素对应一条分解需求。 - `question_label`:列表的列表,每个子列表包含对应分解问题的一系列标签。 ## 附加信息 ### 许可证信息 InfoBench数据集1.0.0版本依据[MIT许可证](https://github.com/qinyiwei/InfoBench/blob/main/LICENSE)发布。 ### 引用信息 @article{qin2024infobench, title={InFoBench: Evaluating Instruction Following Ability in Large Language Models}, author={Yiwei Qin and Kaiqiang Song and Yebowen Hu and Wenlin Yao and Sangwoo Cho and Xiaoyang Wang and Xuansheng Wu and Fei Liu and Pengfei Liu and Dong Yu}, year={2024}, eprint={2401.03601}, archivePrefix={arXiv}, primaryClass={cs.CL} }
提供机构:
kqsong
原始信息汇总

数据集卡片 for InFoBench 数据集

数据集描述

InFoBench 数据集是一个评估基准数据集,包含 500 条指令和相应的 2250 条分解要求。

数据集使用

您可以直接使用 huggingface datasets 下载。 python from datasets import load_dataset

dataset = load_dataset("kqsong/InFoBench")

数据集结构

数据实例

每个实例包含一个指令字符串、一个输入字符串(可选)、一个分解问题列表以及每个分解问题的标签列表。

json { "id": "domain_oriented_task_215", "input": "", "category": "Business and Economics: Business Administration", "instruction": "Generate a non-disclosure agreement of two pages (each page is limited to 250 words) for a software development project involving Party A and Party B. The confidentiality duration should be 5 years.

The first page should include definitions for key terms such as confidential information, disclosure, and recipient.

On the second page, provide clauses detailing the protocol for the return or destruction of confidential information, exceptions to maintaining confidentiality, and the repercussions following a breach of the agreement.

Please indicate the separation between the first and second pages with a full line of dashed lines (-----). Also, make sure that each page is clearly labeled with its respective page number.", "decomposed_questions": [ "Is the generated text a non-disclosure agreement?", "Does the generated text consist of two pages?", "Is each page of the generated text limited to 250 words?", "Is the generated non-disclosure agreement for a software development project involving Party A and Party B?", "Does the generated non-disclosure agreement specify a confidentiality duration of 5 years?", "Does the first page of the generated non-disclosure agreement include definitions for key terms such as confidential information, disclosure, and recipient?", "Does the second page of the generated non-disclosure agreement provide clauses detailing the protocol for the return or destruction of confidential information?", "Does the second page of the generated non-disclosure agreement provide exceptions to maintaining confidentiality?", "Does the second page of the generated non-disclosure agreement provide the repercussions following a breach of the agreement?", "Does the generated text indicate the separation between the first and second pages with a full line of dashed lines (-----)?", "Does the generated text ensure that each page is clearly labeled with its respective page number?" ], "subset": "Hard_set", "question_label": [ ["Format"], ["Format", "Number"], ["Number"], ["Content"], ["Content"], ["Format", "Content"], ["Content"], ["Content"], ["Content"], ["Format"], ["Format"] ] }

数据字段

  • id: 字符串。
  • subset: Hard_SetEasy_Set
  • category: 包含分类信息的字符串。
  • instruction: 包含指令的字符串。
  • input: 包含上下文信息的字符串,可能是空字符串。
  • decomposed_questions: 字符串列表,每个对应一个分解要求。
  • question_label: 字符串列表的列表,每个列表包含对应分解问题的标签序列。

附加信息

许可信息

InFoBench 数据集版本 1.0.0 在 MIT 许可 下发布。

引用信息

@article{qin2024infobench, title={InFoBench: Evaluating Instruction Following Ability in Large Language Models}, author={Yiwei Qin and Kaiqiang Song and Yebowen Hu and Wenlin Yao and Sangwoo Cho and Xiaoyang Wang and Xuansheng Wu and Fei Liu and Pengfei Liu and Dong Yu}, year={2024}, eprint={2401.03601}, archivePrefix={arXiv}, primaryClass={cs.CL} }

搜集汇总
数据集介绍
main_image_url
构建方式
InFoBench数据集的构建,是基于对大型语言模型在遵循指令方面的能力进行评估的需求。该数据集包含了500条指令及其对应的2250条分解需求,通过精心设计的指令和分解问题,以实现对语言模型在理解和执行复杂指令方面的性能进行量化。
特点
InFoBench数据集的特点在于其详尽的指令分解和多样化的任务类别。数据集涵盖了从简单到复杂的各类任务,如生成非披露协议、撰写商业信函等,每一项任务都被分解为多个子问题,以便更精确地评估模型在各个细节上的表现。此外,数据集采用MIT许可证发布,保证了其使用的开放性和灵活性。
使用方法
使用InFoBench数据集时,用户可以通过HuggingFace的datasets库直接下载。数据集的结构包括指令、输入字符串(可选)、分解问题列表以及每个分解问题的标签列表。用户可以根据自己的需要,对数据集中的各项任务进行训练和评估,以提升模型在指令遵循方面的能力。
背景与挑战
背景概述
InFoBench数据集,由Qin Yiwei等研究人员于2024年创建,旨在评估大型语言模型遵循指令的能力。该数据集包含500个指令和相应的2250个分解需求,涵盖了商业管理、软件开发等多个领域的任务。InFoBench的构建,为研究指令遵循能力提供了新的视角和工具,对自然语言处理领域产生了重要影响。
当前挑战
InFoBench数据集在构建和应用过程中面临的挑战主要包括:如何精确地分解复杂的指令以满足细粒度的评估需求;如何在保证数据质量的同时,处理多样化的任务领域和格式要求;以及如何确保评估结果的有效性和公正性,以推动大型语言模型在遵循指令方面的研究和应用。
常用场景
经典使用场景
在人工智能领域,尤其是大型语言模型的研究与评估中,InFoBench数据集以其独特的指令遵循能力评估而备受关注。该数据集通过500条指令及其对应的2250个分解需求,为研究者提供了一个精准的评测平台,以检验模型对复杂指令的理解与执行能力。
解决学术问题
InFoBench数据集解决了传统评估方法中难以量化和细粒度评价模型指令遵循能力的问题。通过分解指令为多个子问题,并标注每个子问题的类别,该数据集为学术研究提供了深入分析模型性能的维度,从而推动了大型语言模型在理解复杂指令方面的研究进展。
衍生相关工作
基于InFoBench数据集,学术界已经衍生出一系列相关工作,包括对大型语言模型指令遵循能力的深入分析、性能比较研究,以及针对特定领域任务的定制化模型开发。这些工作进一步扩展了InFoBench的应用范围,为相关领域的研究提供了重要的数据支持和参考依据。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作