IFEval_es
收藏魔搭社区2025-12-05 更新2025-02-01 收录
下载链接:
https://modelscope.cn/datasets/BSC-LT/IFEval_es
下载链接
链接失效反馈官方服务:
资源简介:
# Dataset Card for IFEval_es
<!-- Provide a quick summary of the dataset. -->
IFEval_es is a prompt dataset in Spanish, professionally translated from the main version of the [IFEval](https://huggingface.co/datasets/google/IFEval) dataset in English.
## Dataset Details
### Dataset Description
<!-- Provide a longer summary of what this dataset is. -->
IFEval_es (Instruction-Following Eval benchmark - Spanish) is designed to evaluating chat or instruction fine-tuned language models. The dataset comprises 541 "verifiable instructions" such as "write in more than 400 words" and "mention the keyword of AI at least 3 times" which can be verified by heuristics. Each instance contains just one input prompt.
- **Curated by:** [Language Technologies Unit | BSC-CNS](https://www.bsc.es/discover-bsc/organisation/research-departments/language-technologies-unit)
- **Funded by:** [ILENIA](https://proyectoilenia.es/en/)
<!-- - **Shared by [optional]:** [More Information Needed] -->
- **Language(s) (NLP):** Spanish (`es-ES`)
- **License:** [CC-BY 4.0](https://creativecommons.org/licenses/by/4.0/deed) ([Original](https://huggingface.co/datasets/google/IFEval)) **
### Dataset Sources [optional]
<!-- Provide the basic links for the dataset. -->
- **Repository:** [HuggingFace](https://huggingface.co/datasets/BSC-LT)
<!-- - **Paper [optional]:** [More Information Needed] -->
<!-- - **Demo [optional]:** [More Information Needed] -->
## Uses
<!-- Address questions around how the dataset is intended to be used. -->
IFEval_es is intended to evaluate language models on "verifiable instructions".
### Out-of-Scope Use
<!-- This section addresses misuse, malicious use, and uses that the dataset will not work well for. -->
IFEval_es-test should **not** be used to train any language model.
## Dataset Structure
<!-- This section provides a description of the dataset fields, and additional information about the dataset structure such as criteria used to create the splits, relationships between data points, etc. -->
The dataset is provided in JSONL format, where each row corresponds to prompt and contains an instance identifier and the corresponding input prompt. Each row contains the following fields:
- `key`: text string containing the instance identifier.
- `prompt`: text string with the request.
- `instruction_id_list`: array of verifiable instructions.
- `kwargs`: array of arguments used to specify each verifiable instruction in `instruction_id_list`.
For example:
```
{
"key": 1000,
"prompt": "Escribe un resumen de más de 300 palabras de la página de la Wikipedia: \"https://es.wikipedia.org/wiki/Raimundo_III_de_Tr%C3%ADpoli\". No utilices comas y destaca al menos 3 secciones que tengan títulos en formato markdown, por ejemplo: *sección destacada parte 1*, *sección destacada parte 2*, *sección destacada parte 3*.",
"instruction_id_list": ["es:punctuation:no_comma", "es:detectable_format:number_highlighted_sections", "es:length_constraints:number_words"],
"kwargs": [{}, {"num_highlights": 3}, {"relation": "at least", "num_words": 300}]
}
```
IFEval_es contains the train split from the main version of the original dataset.
## Dataset Creation
### Curation Rationale
<!-- Motivation for the creation of this dataset. -->
From the paper Zhou, J. et al. (2023). Instruction-Following Evaluation for Large Language Models.:
> Evaluation of intruction-following abilities in LLMs is not standardized: Human evaluations are expensive, slow, and not objectively reproducible, while LLM-based auto-evaluation is potentially biased or limited by the ability of the evaluator LLM. To overcome these issues, we introduce Instruction-Following Eval (IFEval) for large language models.
We have translated this dataset to improve the Spanish support in the NLP field and to allow cross-lingual comparisons in language models.
### Source Data
<!-- This section describes the source data (e.g. news text and headlines, social media posts, translated sentences, ...). -->
IFEval_es comes from the main version of [IFEval](https://huggingface.co/datasets/google/IFEval), which is inspired in recurring prompts that are given to any language-model assistant.
#### Data Collection and Processing
<!-- This section describes the data collection and processing process such as data selection criteria, filtering and normalization methods, tools and libraries used, etc. -->
Data was gathered from the main version of [IFEval](https://huggingface.co/datasets/google/IFEval). We did not modify the original dataset.
The translation process to Spanish was based on the following guidelines:
- **Date & Unit conversion**: Adapt dates, metric systems, currencies, etc., to our context, except when the task involves metric system conversion.
- **Personal Names**: Translate English names with clear Spanish equivalents; otherwise, use common names in our context. Maintain consistency in translated names throughout the text. Names of individual figures are not translated.
- **Language Style**: Avoid uniformity in translation, maintaining a rich and varied language reflecting our linguistic depth.
- **Dataset Logic**: Ensure internal logic of datasets is maintained; answers should remain relevant and accurate. Factual accuracy is key in question-answer datasets. Maintain the correct option in multiple-choice datasets.
- **Error Handling**: Fix errors in the English text during translation unless otherwise specified for the specific dataset. Spelling mistakes must be corrected in Spanish.
- **Avoiding Patterns and Maintaining Length**: Avoid including patterns that could hint at the correct option, maintaining difficulty. Match the length of responses to the original text as closely as possible. Handle scientific terminology carefully to ensure consistency.
#### Who are the source data producers?
<!-- This section describes the people or systems who originally created the data. It should also include self-reported demographic or identity information for the source data creators if this information is available. -->
IFEval_es is a professional translation of the [IFEval](https://huggingface.co/datasets/google/IFEval), completed by a single translator who is a native speaker of Spanish. The translator was provided with the entire test split, as well as a set of translation preferences and guidelines, along with a brief explanation of the original corpus. To ensure ongoing communication, the translator was asked to provide sample translations at periodical intervals. These translations were then reviewed by a Spanish speaker within our team, who later translated and verified the metadata.
Additionally, the translator was encouraged to seek clarification on any specific doubts they had, and any necessary corrections were applied to the entire dataset.
### Annotations [optional]
<!-- If the dataset contains annotations which are not part of the initial data collection, use this section to describe them. -->
#### Annotation process
<!-- This section describes the annotation process such as annotation tools used in the process, the amount of data annotated, annotation guidelines provided to the annotators, interannotator statistics, annotation validation, etc. -->
Refer to the original paper (Zhou, J. et al. (2023). Instruction-Following Evaluation for Large Language Models.).
#### Who are the annotators?
<!-- This section describes the people or systems who created the annotations. -->
Refer to the original paper (Zhou, J. et al. (2023). Instruction-Following Evaluation for Large Language Models.).
#### Personal and Sensitive Information
<!-- State whether the dataset contains data that might be considered personal, sensitive, or private (e.g., data that reveals addresses, uniquely identifiable names or aliases, racial or ethnic origins, sexual orientations, religious beliefs, political opinions, financial or health data, etc.). If efforts were made to anonymize the data, describe the anonymization process. -->
No personal or sensitive information included.
## Bias, Risks, and Limitations
<!-- This section is meant to convey both technical and sociotechnical limitations. -->
[More Information Needed]
### Recommendations
<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
Users should be made aware of the risks, biases and limitations of the dataset. More information needed for further recommendations.
## Citation [optional]
<!-- If there is a paper or blog post introducing the dataset, the APA and Bibtex information for that should go in this section. -->
```
@misc{zhou2023instructionfollowingevaluationlargelanguage,
title={Instruction-Following Evaluation for Large Language Models},
author={Jeffrey Zhou and Tianjian Lu and Swaroop Mishra and Siddhartha Brahma and Sujoy Basu and Yi Luan and Denny Zhou and Le Hou},
year={2023},
eprint={2311.07911},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2311.07911},
}
```
**BibTeX:**
[More Information Needed]
**APA:**
[More Information Needed]
## Glossary [optional]
<!-- If relevant, include terms and calculations in this section that can help readers understand the dataset or dataset card. -->
[More Information Needed]
## More Information [optional]
This work is funded by the Ministerio para la Transformación Digital y de la Función Pública - Funded by EU – NextGenerationEU within the framework of
the [project ILENIA](https://proyectoilenia.es/) with reference 2022/TL22/00215337.
** License was changed to CC-BY-4.0 since the authors only specified the default license Apache 2.0 which is meant for software and not for data artifacts, and does not require derivative works to be licensed under the same terms
## Dataset Card Authors [optional]
[More Information Needed]
## Dataset Card Contact
Language Technologies Unit (langtech@bsc.es) at the Barcelona Supercomputing Center (BSC).
# IFEval_es 数据集卡片
<!-- 提供数据集的简要概述。 -->
IFEval_es是一个西班牙语提示数据集,由英文原版[IFEval](https://huggingface.co/datasets/google/IFEval)的主版本经专业翻译而来。
## 数据集详情
### 数据集描述
<!-- 提供该数据集的详细概述。 -->
IFEval_es(指令遵循评估基准测试 - 西班牙语版)旨在评估聊天模型或经过指令微调的大语言模型(Large Language Model, LLM)。该数据集包含541条"可验证指令",例如"撰写超过400词的内容"与"至少三次提及AI关键词",这些指令可通过启发式规则进行验证。每个数据实例仅包含一条输入提示。
- **整理方:** [语言技术单元 | BSC-CNS](https://www.bsc.es/discover-bsc/organisation/research-departments/language-technologies-unit)
- **资助方:** [ILENIA](https://proyectoilenia.es/en/)
- **自然语言处理所用语言:** 西班牙语(`es-ES`)
- **授权协议:** [CC-BY 4.0](https://creativecommons.org/licenses/by/4.0/deed)([原版协议](https://huggingface.co/datasets/google/IFEval))
### 数据集来源(可选)
<!-- 提供数据集的基础链接。 -->
- **代码仓库:** [HuggingFace](https://huggingface.co/datasets/BSC-LT)
<!-- - **论文(可选):** [需补充更多信息] -->
<!-- - **演示(可选):** [需补充更多信息] -->
## 用途
<!-- 解答该数据集的预期使用场景相关问题。 -->
IFEval_es旨在评估大语言模型的"可验证指令遵循能力"。
### 适用范围外使用
<!-- 该部分说明误用、恶意使用,以及本数据集无法很好适配的使用场景。 -->
IFEval_es-test **不得** 用于任何大语言模型的训练。
## 数据集结构
<!-- 该部分描述数据集的字段,以及其他数据集结构相关信息,例如划分数据集所用的准则、数据点之间的关系等。 -->
该数据集以JSONL格式提供,每一行对应一条提示,包含实例标识符与对应的输入提示。每一行包含以下字段:
- `key`: 包含实例标识符的文本字符串。
- `prompt`: 包含请求内容的文本字符串。
- `instruction_id_list`: 可验证指令的数组。
- `kwargs`: 用于指定`instruction_id_list`中每条可验证指令的参数字段数组。
例如:
{
"key": 1000,
"prompt": "Escribe un resumen de más de 300 palabras de la página de la Wikipedia: "https://es.wikipedia.org/wiki/Raimundo_III_de_Tr%C3%ADpoli". No utilices comas y destaca al menos 3 secciones que tengan títulos en formato markdown, por ejemplo: *sección destacada parte 1*, *sección destacada parte 2*, *sección destacada parte 3*.",
"instruction_id_list": ["es:punctuation:no_comma", "es:detectable_format:number_highlighted_sections", "es:length_constraints:number_words"],
"kwargs": [{}, {"num_highlights": 3}, {"relation": "at least", "num_words": 300}]
}
IFEval_es包含了原始数据集主版本中的训练划分(train split)。
## 数据集创建
### 创作初衷
<!-- 创建该数据集的动机。 -->
该数据集的创作动机源自Zhou, J. 等人(2023)的论文《Instruction-Following Evaluation for Large Language Models》:
> 对大语言模型指令遵循能力的评估尚未形成标准化体系:人工评估成本高昂、效率低下且无法客观复现,而基于大语言模型的自动评估则可能存在偏差,或受评估用大语言模型自身能力的限制。为解决这些问题,我们推出了面向大语言模型的指令遵循评估基准测试(Instruction-Following Eval, IFEval)。
我们对该数据集进行翻译,旨在完善自然语言处理领域的西班牙语支持,并支持大语言模型的跨语言对比研究。
### 源数据
<!-- 该部分描述源数据(例如新闻文本与标题、社交媒体帖文、翻译句对等)。 -->
IFEval_es源自[IFEval](https://huggingface.co/datasets/google/IFEval)的主版本,其灵感来源于面向各类语言模型助手的常用提示词。
#### 数据收集与处理
<!-- 该部分描述数据收集与处理流程,例如数据选择准则、过滤与归一化方法、所用工具与库等。 -->
数据源自[IFEval](https://huggingface.co/datasets/google/IFEval)的主版本,我们未对原始数据集进行修改。
本次西班牙语翻译流程遵循以下准则:
- **日期与单位转换**:将日期、公制单位、货币等适配至本地语境,但若任务涉及公制单位转换则保留原样。
- **专有名称**:将有明确西班牙语对应译法的英文名称译为西班牙语;若无对应译法,则使用本地语境中的通用名称。确保全文译法统一。人物个体名称不进行翻译。
- **语言风格**:避免译法单调重复,采用丰富多样的语言风格,体现西班牙语的语言深度。
- **数据集逻辑**:确保数据集内部逻辑保持一致;问答类数据集的回答需保持相关性与准确性,事实准确性为核心要求。选择题数据集需保留正确选项。
- **错误处理**:翻译过程中修正英文文本中的错误,除非特定数据集另有说明。西班牙语译文中需修正拼写错误。
- **避免模式化与保持篇幅**:避免使用可能暗示正确答案的模式化表达,维持任务难度。译文篇幅需尽可能与原文保持一致。谨慎处理科学术语,确保译法统一。
#### 源数据创作者是谁?
<!-- 该部分描述最初创建数据的个人或系统。若可获取,还应包含源数据创作者自行申报的人口统计或身份信息。 -->
IFEval_es是[IFEval](https://huggingface.co/datasets/google/IFEval)的专业译版,由一名西班牙语母语译者完成。译者收到了完整的测试划分(test split)、一套翻译偏好与准则,以及原始语料库的简要说明。为确保沟通顺畅,译者需定期提交翻译样本。随后,团队内的西班牙语母语者对这些样本进行审核,由其完成元数据的翻译与验证工作。此外,译者可就任何疑问寻求澄清,必要时将对全数据集进行修正。
### 标注(可选)
<!-- 若数据集包含初始数据收集之外的标注内容,请使用该部分描述。 -->
#### 标注流程
<!-- 该部分描述标注流程,例如所用标注工具、标注数据量、提供给标注人员的标注准则、标注者间统计数据、标注验证等。 -->
参考原始论文(Zhou, J. 等人(2023). Instruction-Following Evaluation for Large Language Models.)。
#### 标注人员
<!-- 该部分描述创建标注的个人或系统。 -->
参考原始论文(Zhou, J. 等人(2023). Instruction-Following Evaluation for Large Language Models.)。
#### 个人与敏感信息
<!-- 说明数据集是否包含可被视为个人、敏感或隐私的数据(例如揭示地址、唯一可识别的姓名或别名、种族或族裔出身、性取向、宗教信仰、政治观点、财务或健康数据等)。若已采取措施对数据进行匿名化,请描述匿名化流程。 -->
本数据集未包含任何个人或敏感信息。
## 偏差、风险与局限性
<!-- 该部分旨在说明技术与社会技术层面的局限性。 -->
[需补充更多信息]
### 建议
<!-- 该部分旨在针对偏差、风险与技术局限性给出相关建议。 -->
用户需知晓本数据集存在的偏差、风险与局限性。进一步的建议仍需补充更多信息。
## 引用(可选)
<!-- 若有介绍该数据集的论文或博客文章,此处应包含APA与Bibtex格式的引用信息。 -->
@misc{zhou2023instructionfollowingevaluationlargelanguage,
title={Instruction-Following Evaluation for Large Language Models},
author={Jeffrey Zhou and Tianjian Lu and Swaroop Mishra and Siddhartha Brahma and Sujoy Basu and Yi Luan and Denny Zhou and Le Hou},
year={2023},
eprint={2311.07911},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2311.07911},
}
**BibTeX格式引用:**
[需补充更多信息]
**APA格式引用:**
[需补充更多信息]
## 术语表(可选)
<!-- 若有需要,可在此处包含有助于读者理解数据集或数据集卡片的术语与计算公式。 -->
[需补充更多信息]
## 更多信息(可选)
本项目由西班牙数字化转型与公共职能部资助——由欧盟——下一代欧盟(NextGenerationEU)框架下的[ILENIA项目](https://proyectoilenia.es/)资助,项目编号为2022/TL22/00215337。
**授权协议变更说明:** 原始数据集的默认授权协议为Apache 2.0,该协议适用于软件而非数据制品,且未要求衍生作品遵循相同授权条款,因此本数据集的授权协议已更改为CC-BY 4.0。
## 数据集卡片作者(可选)
[需补充更多信息]
## 数据集卡片联系方式
巴塞罗那超级计算中心(BSC)语言技术单元(langtech@bsc.es)。
提供机构:
maas
创建时间:
2025-01-26



