MultiNRC
收藏魔搭社区2025-12-05 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/ScaleAI/MultiNRC
下载链接
链接失效反馈官方服务:
资源简介:
# MultiNRC: Multilingual Native Reasoning Challenge
MultiNRC is a challenging evaluation benchmark for large language models, designed to assess multilingual reasoning ability in French, Spanish, and Chinese. Unlike existing benchmarks that simply translate English-centric content, MultiNRC consists of over 1,000 native-authored reasoning questions, crafted by native speakers to capture linguistic and cultural nuances.
## Features
- **Languages:** French, Spanish, Chinese
- **Categories:**
- Language-specific Linguistic Reasoning
- Wordplay & Riddles
- Cultural Reasoning & Traditions
- Math Reasoning with Cultural Relevance
- **English Equivalents:** For Cultural/Tradition and Math, human-translated English versions are provided for direct comparison.
- **Ground Truth Final Answers:** Short, objective answers accompany each prompt for automatic evaluation.
## Dataset Structure
Each entry includes:
- A native-language prompt and answer (`i18n_prompt`, `i18n_gtfa`)
- (For Math Reasoning and Cultural Reasoning category tasks) An English-equivalent prompt and answer (`english_prompt`, `english_gtfa`)
- Metadata: `task_id`, `language`, `category`
## Citation
If you use MultiNRC in your research, please cite:
```bibtex
@article{fabbri2025multinrc,
title = {MultiNRC: A Challenging Native Multilingual Reasoning Evaluation Benchmark for LLMs},
author = {Fabbri, Alexander R. and Mares, Diego and Flores, Jorge and Mankikar, Meher and Hernandez, Ernesto and Lee, Dean and Liu, Bing and Xing, Chen},
year = {2025},
note = {arXiv preprint, arXiv:XXXX.XXXXX}
}
# MultiNRC:多语言原生推理评测基准
MultiNRC 是一款面向大语言模型(Large Language Model,LLM)的高挑战性评测基准,旨在评估法语、西班牙语及中文场景下的多语言推理能力。与仅对以英语为中心的内容进行简单翻译的现有基准不同,MultiNRC 包含超过1000道由母语使用者原创的推理题目,以精准捕捉语言与文化层面的细微差异。
## 核心特性
- **覆盖语言**:法语、西班牙语、中文
- **任务分类**:
- 特定语言的语言推理(Language-specific Linguistic Reasoning)
- 文字游戏与谜语
- 文化推理与传统习俗
- 带文化关联的数学推理(Math Reasoning with Cultural Relevance)
- **英文对照版本**:针对文化推理与传统、数学推理两类任务,提供人工翻译的英文版本以供直接对比评测。
- **标准答案**:每道提示题均配有简短客观的标准答案,支持自动化评测。
## 数据集结构
每条数据条目包含:
- 母语语言的提示文本与标准答案(对应字段:`i18n_prompt`、`i18n_gtfa`)
- (针对数学推理与文化推理类任务)附带英文对照的提示文本与标准答案(对应字段:`english_prompt`、`english_gtfa`)
- 元数据:`task_id`(任务ID)、`language`(语言)、`category`(任务分类)
## 引用说明
若您在研究中使用 MultiNRC 数据集,请引用以下文献:
bibtex
@article{fabbri2025multinrc,
title = {MultiNRC:面向大语言模型的原生多语言推理评测基准},
author = {Fabbri, Alexander R. and Mares, Diego and Flores, Jorge and Mankikar, Meher and Hernandez, Ernesto and Lee, Dean and Liu, Bing and Xing, Chen},
year = {2025},
note = {arXiv预印本,arXiv:XXXX.XXXXX}
}
提供机构:
maas
创建时间:
2025-09-23



