IFEval_es

Name: IFEval_es
Creator: maas
Published: 2025-12-05 16:21:47
License: 暂无描述

魔搭社区2025-12-05 更新2025-02-01 收录

下载链接：

https://modelscope.cn/datasets/BSC-LT/IFEval_es

下载链接

链接失效反馈

官方服务：

资源简介：

# Dataset Card for IFEval_es  IFEval_es is a prompt dataset in Spanish, professionally translated from the main version of the [IFEval](https://huggingface.co/datasets/google/IFEval) dataset in English. ## Dataset Details ### Dataset Description  IFEval_es (Instruction-Following Eval benchmark - Spanish) is designed to evaluating chat or instruction fine-tuned language models. The dataset comprises 541 "verifiable instructions" such as "write in more than 400 words" and "mention the keyword of AI at least 3 times" which can be verified by heuristics. Each instance contains just one input prompt. - **Curated by:** [Language Technologies Unit | BSC-CNS](https://www.bsc.es/discover-bsc/organisation/research-departments/language-technologies-unit) - **Funded by:** [ILENIA](https://proyectoilenia.es/en/)  - **Language(s) (NLP):** Spanish (`es-ES`) - **License:** [CC-BY 4.0](https://creativecommons.org/licenses/by/4.0/deed) ([Original](https://huggingface.co/datasets/google/IFEval)) ** ### Dataset Sources [optional]  - **Repository:** [HuggingFace](https://huggingface.co/datasets/BSC-LT)   ## Uses  IFEval_es is intended to evaluate language models on "verifiable instructions". ### Out-of-Scope Use  IFEval_es-test should **not** be used to train any language model. ## Dataset Structure  The dataset is provided in JSONL format, where each row corresponds to prompt and contains an instance identifier and the corresponding input prompt. Each row contains the following fields: - `key`: text string containing the instance identifier. - `prompt`: text string with the request. - `instruction_id_list`: array of verifiable instructions. - `kwargs`: array of arguments used to specify each verifiable instruction in `instruction_id_list`. For example: ``` { "key": 1000, "prompt": "Escribe un resumen de más de 300 palabras de la página de la Wikipedia: \"https://es.wikipedia.org/wiki/Raimundo_III_de_Tr%C3%ADpoli\". No utilices comas y destaca al menos 3 secciones que tengan títulos en formato markdown, por ejemplo: *sección destacada parte 1*, *sección destacada parte 2*, *sección destacada parte 3*.", "instruction_id_list": ["es:punctuation:no_comma", "es:detectable_format:number_highlighted_sections", "es:length_constraints:number_words"], "kwargs": [{}, {"num_highlights": 3}, {"relation": "at least", "num_words": 300}] } ``` IFEval_es contains the train split from the main version of the original dataset. ## Dataset Creation ### Curation Rationale  From the paper Zhou, J. et al. (2023). Instruction-Following Evaluation for Large Language Models.: > Evaluation of intruction-following abilities in LLMs is not standardized: Human evaluations are expensive, slow, and not objectively reproducible, while LLM-based auto-evaluation is potentially biased or limited by the ability of the evaluator LLM. To overcome these issues, we introduce Instruction-Following Eval (IFEval) for large language models. We have translated this dataset to improve the Spanish support in the NLP field and to allow cross-lingual comparisons in language models. ### Source Data  IFEval_es comes from the main version of [IFEval](https://huggingface.co/datasets/google/IFEval), which is inspired in recurring prompts that are given to any language-model assistant. #### Data Collection and Processing  Data was gathered from the main version of [IFEval](https://huggingface.co/datasets/google/IFEval). We did not modify the original dataset. The translation process to Spanish was based on the following guidelines: - **Date & Unit conversion**: Adapt dates, metric systems, currencies, etc., to our context, except when the task involves metric system conversion. - **Personal Names**: Translate English names with clear Spanish equivalents; otherwise, use common names in our context. Maintain consistency in translated names throughout the text. Names of individual figures are not translated. - **Language Style**: Avoid uniformity in translation, maintaining a rich and varied language reflecting our linguistic depth. - **Dataset Logic**: Ensure internal logic of datasets is maintained; answers should remain relevant and accurate. Factual accuracy is key in question-answer datasets. Maintain the correct option in multiple-choice datasets. - **Error Handling**: Fix errors in the English text during translation unless otherwise specified for the specific dataset. Spelling mistakes must be corrected in Spanish. - **Avoiding Patterns and Maintaining Length**: Avoid including patterns that could hint at the correct option, maintaining difficulty. Match the length of responses to the original text as closely as possible. Handle scientific terminology carefully to ensure consistency. #### Who are the source data producers?  IFEval_es is a professional translation of the [IFEval](https://huggingface.co/datasets/google/IFEval), completed by a single translator who is a native speaker of Spanish. The translator was provided with the entire test split, as well as a set of translation preferences and guidelines, along with a brief explanation of the original corpus. To ensure ongoing communication, the translator was asked to provide sample translations at periodical intervals. These translations were then reviewed by a Spanish speaker within our team, who later translated and verified the metadata. Additionally, the translator was encouraged to seek clarification on any specific doubts they had, and any necessary corrections were applied to the entire dataset. ### Annotations [optional]  #### Annotation process  Refer to the original paper (Zhou, J. et al. (2023). Instruction-Following Evaluation for Large Language Models.). #### Who are the annotators?  Refer to the original paper (Zhou, J. et al. (2023). Instruction-Following Evaluation for Large Language Models.). #### Personal and Sensitive Information  No personal or sensitive information included. ## Bias, Risks, and Limitations  [More Information Needed] ### Recommendations  Users should be made aware of the risks, biases and limitations of the dataset. More information needed for further recommendations. ## Citation [optional]  ``` @misc{zhou2023instructionfollowingevaluationlargelanguage, title={Instruction-Following Evaluation for Large Language Models}, author={Jeffrey Zhou and Tianjian Lu and Swaroop Mishra and Siddhartha Brahma and Sujoy Basu and Yi Luan and Denny Zhou and Le Hou}, year={2023}, eprint={2311.07911}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2311.07911}, } ``` **BibTeX:** [More Information Needed] **APA:** [More Information Needed] ## Glossary [optional]  [More Information Needed] ## More Information [optional] This work is funded by the Ministerio para la Transformación Digital y de la Función Pública - Funded by EU – NextGenerationEU within the framework of the [project ILENIA](https://proyectoilenia.es/) with reference 2022/TL22/00215337. ** License was changed to CC-BY-4.0 since the authors only specified the default license Apache 2.0 which is meant for software and not for data artifacts, and does not require derivative works to be licensed under the same terms ## Dataset Card Authors [optional] [More Information Needed] ## Dataset Card Contact Language Technologies Unit (langtech@bsc.es) at the Barcelona Supercomputing Center (BSC).

# IFEval_es 数据集卡片  IFEval_es是一个西班牙语提示数据集，由英文原版[IFEval](https://huggingface.co/datasets/google/IFEval)的主版本经专业翻译而来。 ## 数据集详情 ### 数据集描述  IFEval_es（指令遵循评估基准测试 - 西班牙语版）旨在评估聊天模型或经过指令微调的大语言模型（Large Language Model, LLM）。该数据集包含541条"可验证指令"，例如"撰写超过400词的内容"与"至少三次提及AI关键词"，这些指令可通过启发式规则进行验证。每个数据实例仅包含一条输入提示。 - **整理方：** [语言技术单元 | BSC-CNS](https://www.bsc.es/discover-bsc/organisation/research-departments/language-technologies-unit) - **资助方：** [ILENIA](https://proyectoilenia.es/en/) - **自然语言处理所用语言：** 西班牙语（`es-ES`） - **授权协议：** [CC-BY 4.0](https://creativecommons.org/licenses/by/4.0/deed)（[原版协议](https://huggingface.co/datasets/google/IFEval)） ### 数据集来源（可选）  - **代码仓库：** [HuggingFace](https://huggingface.co/datasets/BSC-LT)   ## 用途  IFEval_es旨在评估大语言模型的"可验证指令遵循能力"。 ### 适用范围外使用  IFEval_es-test **不得** 用于任何大语言模型的训练。 ## 数据集结构  该数据集以JSONL格式提供，每一行对应一条提示，包含实例标识符与对应的输入提示。每一行包含以下字段： - `key`: 包含实例标识符的文本字符串。 - `prompt`: 包含请求内容的文本字符串。 - `instruction_id_list`: 可验证指令的数组。 - `kwargs`: 用于指定`instruction_id_list`中每条可验证指令的参数字段数组。例如： { "key": 1000, "prompt": "Escribe un resumen de más de 300 palabras de la página de la Wikipedia: "https://es.wikipedia.org/wiki/Raimundo_III_de_Tr%C3%ADpoli". No utilices comas y destaca al menos 3 secciones que tengan títulos en formato markdown, por ejemplo: *sección destacada parte 1*, *sección destacada parte 2*, *sección destacada parte 3*.", "instruction_id_list": ["es:punctuation:no_comma", "es:detectable_format:number_highlighted_sections", "es:length_constraints:number_words"], "kwargs": [{}, {"num_highlights": 3}, {"relation": "at least", "num_words": 300}] } IFEval_es包含了原始数据集主版本中的训练划分（train split）。 ## 数据集创建 ### 创作初衷  该数据集的创作动机源自Zhou, J. 等人(2023)的论文《Instruction-Following Evaluation for Large Language Models》： > 对大语言模型指令遵循能力的评估尚未形成标准化体系：人工评估成本高昂、效率低下且无法客观复现，而基于大语言模型的自动评估则可能存在偏差，或受评估用大语言模型自身能力的限制。为解决这些问题，我们推出了面向大语言模型的指令遵循评估基准测试（Instruction-Following Eval, IFEval）。我们对该数据集进行翻译，旨在完善自然语言处理领域的西班牙语支持，并支持大语言模型的跨语言对比研究。 ### 源数据  IFEval_es源自[IFEval](https://huggingface.co/datasets/google/IFEval)的主版本，其灵感来源于面向各类语言模型助手的常用提示词。 #### 数据收集与处理  数据源自[IFEval](https://huggingface.co/datasets/google/IFEval)的主版本，我们未对原始数据集进行修改。本次西班牙语翻译流程遵循以下准则： - **日期与单位转换**：将日期、公制单位、货币等适配至本地语境，但若任务涉及公制单位转换则保留原样。 - **专有名称**：将有明确西班牙语对应译法的英文名称译为西班牙语；若无对应译法，则使用本地语境中的通用名称。确保全文译法统一。人物个体名称不进行翻译。 - **语言风格**：避免译法单调重复，采用丰富多样的语言风格，体现西班牙语的语言深度。 - **数据集逻辑**：确保数据集内部逻辑保持一致；问答类数据集的回答需保持相关性与准确性，事实准确性为核心要求。选择题数据集需保留正确选项。 - **错误处理**：翻译过程中修正英文文本中的错误，除非特定数据集另有说明。西班牙语译文中需修正拼写错误。 - **避免模式化与保持篇幅**：避免使用可能暗示正确答案的模式化表达，维持任务难度。译文篇幅需尽可能与原文保持一致。谨慎处理科学术语，确保译法统一。 #### 源数据创作者是谁？  IFEval_es是[IFEval](https://huggingface.co/datasets/google/IFEval)的专业译版，由一名西班牙语母语译者完成。译者收到了完整的测试划分（test split）、一套翻译偏好与准则，以及原始语料库的简要说明。为确保沟通顺畅，译者需定期提交翻译样本。随后，团队内的西班牙语母语者对这些样本进行审核，由其完成元数据的翻译与验证工作。此外，译者可就任何疑问寻求澄清，必要时将对全数据集进行修正。 ### 标注（可选）  #### 标注流程  参考原始论文（Zhou, J. 等人(2023). Instruction-Following Evaluation for Large Language Models.）。 #### 标注人员  参考原始论文（Zhou, J. 等人(2023). Instruction-Following Evaluation for Large Language Models.）。 #### 个人与敏感信息  本数据集未包含任何个人或敏感信息。 ## 偏差、风险与局限性  [需补充更多信息] ### 建议  用户需知晓本数据集存在的偏差、风险与局限性。进一步的建议仍需补充更多信息。 ## 引用（可选）  @misc{zhou2023instructionfollowingevaluationlargelanguage, title={Instruction-Following Evaluation for Large Language Models}, author={Jeffrey Zhou and Tianjian Lu and Swaroop Mishra and Siddhartha Brahma and Sujoy Basu and Yi Luan and Denny Zhou and Le Hou}, year={2023}, eprint={2311.07911}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2311.07911}, } **BibTeX格式引用：** [需补充更多信息] **APA格式引用：** [需补充更多信息] ## 术语表（可选）  [需补充更多信息] ## 更多信息（可选）本项目由西班牙数字化转型与公共职能部资助——由欧盟——下一代欧盟（NextGenerationEU）框架下的[ILENIA项目](https://proyectoilenia.es/)资助，项目编号为2022/TL22/00215337。 **授权协议变更说明：** 原始数据集的默认授权协议为Apache 2.0，该协议适用于软件而非数据制品，且未要求衍生作品遵循相同授权条款，因此本数据集的授权协议已更改为CC-BY 4.0。 ## 数据集卡片作者（可选） [需补充更多信息] ## 数据集卡片联系方式巴塞罗那超级计算中心（BSC）语言技术单元（langtech@bsc.es）。

提供机构：

maas

创建时间：

2025-01-26

5,000+

优质数据集

54 个

任务类型

进入经典数据集