Marco-Bench-MIF

Name: Marco-Bench-MIF
Creator: maas
Published: 2026-05-19 19:20:40
License: 暂无描述

魔搭社区2026-05-19 更新2025-11-22 收录

下载链接：

https://modelscope.cn/datasets/AIDC-AI/Marco-Bench-MIF

下载链接

链接失效反馈

官方服务：

资源简介：

# Marco-Bench-MIF: A Benchmark for Multilingual Instruction-Following Evaluation [![License: Apache 2.0](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://www.apache.org/licenses/LICENSE-2.0) [![ACL 2025](https://img.shields.io/badge/ACL-2025-blue)](https://aclanthology.org/2025.acl-long.1172/) [![arXiv](https://img.shields.io/badge/arXiv-2507.11882-b31b1b.svg)](https://arxiv.org/abs/2507.11882) ## Introduction Marco-Bench-MIF is the first deeply localized multilingual benchmark designed to evaluate instruction-following capabilities across 30 languages. Unlike existing benchmarks that rely primarily on machine translation, Marco-Bench-MIF implements fine-grained cultural adaptations to provide more accurate assessment. Our research demonstrates that machine-translated data underestimates model performance by 7-22% in multilingual environments. ## Key Features - **Extensive Language Coverage**: 30 languages spanning 6 major language families, including high-resource (English, Chinese, German) and low-resource languages (Yoruba, Nepali) - **Deep Cultural Localization**: Three-step process of lexical replacement, theme transformation, and pragmatic reconstruction to ensure cultural and linguistic appropriateness - **Diverse Constraint Types**: 541 instruction-response pairs covering single/multiple constraints, expressive/content constraints, and various instruction types - **Comparative Dataset**: Machine-translated and culturally-localized versions available for specific languages (Arabic, Chinese, Spanish, etc.) to enable comparative research ## Dataset Access The dataset will be available through our GitHub repository and Hugging Face: ```bash git clone https://github.com/AIDC-AI/Marco-Bench-MIF.git ``` ## Key Findings Our benchmark evaluated 20+ LLM models and revealed: 1. Model scale strongly correlates with performance, with 70B+ models outperforming 8B models by 45-60% 2. A 25-35% performance gap exists between high-resource languages (German, Chinese) and low-resource languages (Yoruba, Nepali) 3. Significant differences between localized and machine-translated evaluations, especially for complex instructions ## Contact For questions or suggestions, please submit a GitHub issue or contact us: - Email: lyuchenyang.lcy@alibaba-inc.com - Project homepage: https://github.com/AIDC-AI/Marco-Bench-MIF ## License This dataset is licensed under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0). ## Acknowledgments Special thanks to all annotators and translators who participated in dataset construction and validation. This project is supported by Alibaba International Digital Commerce Group.

# Marco-Bench-MIF：面向多语言指令遵循能力评估的基准数据集 [![License: Apache 2.0](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://www.apache.org/licenses/LICENSE-2.0) [![ACL 2025](https://img.shields.io/badge/ACL-2025-blue)](https://aclanthology.org/2025.acl-long.1172/) [![arXiv](https://img.shields.io/badge/arXiv-2507.11882-b31b1b.svg)](https://arxiv.org/abs/2507.11882) ## 引言 Marco-Bench-MIF是首个面向30种语言、经过深度本土化优化的多语言基准数据集，用于评估大语言模型（Large Language Model）的指令遵循能力。与当前主流依赖机器翻译的基准数据集不同，Marco-Bench-MIF通过精细化的文化适配流程，能够提供更为精准的模型性能评估。本研究表明，在多语言场景下，仅使用机器翻译生成的数据集会低估模型性能7%-22%。 ## 核心特性 - **覆盖语种广泛**：涵盖6大语系的30种语言，包含高资源语言（英语、中文、德语）与低资源语言（约鲁巴语、尼泊尔语） - **深度文化适配**：通过词汇替换、主题转换、语用重构三步流程，确保数据集在文化与语言层面的适配性 - **约束类型多样**：包含541组指令-响应对，覆盖单/多约束、表达型/内容型约束以及多种指令类型 - **可对比数据集**：针对阿拉伯语、中文、西班牙语等特定语言，同时提供机器翻译版本与文化适配版本，支持对比研究 ## 数据集获取本数据集将通过GitHub仓库与Hugging Face平台发布： bash git clone https://github.com/AIDC-AI/Marco-Bench-MIF.git ## 核心发现本基准数据集对20余款大语言模型进行了评估，结果显示： 1. 模型规模与性能呈强正相关，70B参数量以上的模型性能较8B参数量模型高出45%-60% 2. 高资源语言（德语、中文）与低资源语言（约鲁巴语、尼泊尔语）之间存在25%-35%的性能差距 3. 文化适配版与机器翻译版的评估结果存在显著差异，在复杂指令场景下尤为明显 ## 联系方式如有疑问或建议，请提交GitHub Issue或通过以下方式联系我们： - 邮箱：lyuchenyang.lcy@alibaba-inc.com - 项目主页：https://github.com/AIDC-AI/Marco-Bench-MIF ## 许可证本数据集采用[Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0)许可证进行授权。 ## 致谢特别感谢所有参与数据集构建与验证的标注人员与翻译人员。本项目得到阿里巴巴国际数字商业集团的支持。

提供机构：

maas

创建时间：

2025-10-27

5,000+

优质数据集

54 个

任务类型

进入经典数据集