five

MultiNRC

收藏
魔搭社区2025-12-05 更新2025-12-06 收录
下载链接:
https://modelscope.cn/datasets/ScaleAI/MultiNRC
下载链接
链接失效反馈
官方服务:
资源简介:
# MultiNRC: Multilingual Native Reasoning Challenge MultiNRC is a challenging evaluation benchmark for large language models, designed to assess multilingual reasoning ability in French, Spanish, and Chinese. Unlike existing benchmarks that simply translate English-centric content, MultiNRC consists of over 1,000 native-authored reasoning questions, crafted by native speakers to capture linguistic and cultural nuances. ## Features - **Languages:** French, Spanish, Chinese - **Categories:** - Language-specific Linguistic Reasoning - Wordplay & Riddles - Cultural Reasoning & Traditions - Math Reasoning with Cultural Relevance - **English Equivalents:** For Cultural/Tradition and Math, human-translated English versions are provided for direct comparison. - **Ground Truth Final Answers:** Short, objective answers accompany each prompt for automatic evaluation. ## Dataset Structure Each entry includes: - A native-language prompt and answer (`i18n_prompt`, `i18n_gtfa`) - (For Math Reasoning and Cultural Reasoning category tasks) An English-equivalent prompt and answer (`english_prompt`, `english_gtfa`) - Metadata: `task_id`, `language`, `category` ## Citation If you use MultiNRC in your research, please cite: ```bibtex @article{fabbri2025multinrc, title = {MultiNRC: A Challenging Native Multilingual Reasoning Evaluation Benchmark for LLMs}, author = {Fabbri, Alexander R. and Mares, Diego and Flores, Jorge and Mankikar, Meher and Hernandez, Ernesto and Lee, Dean and Liu, Bing and Xing, Chen}, year = {2025}, note = {arXiv preprint, arXiv:XXXX.XXXXX} }

# MultiNRC:多语言原生推理评测基准 MultiNRC 是一款面向大语言模型(Large Language Model,LLM)的高挑战性评测基准,旨在评估法语、西班牙语及中文场景下的多语言推理能力。与仅对以英语为中心的内容进行简单翻译的现有基准不同,MultiNRC 包含超过1000道由母语使用者原创的推理题目,以精准捕捉语言与文化层面的细微差异。 ## 核心特性 - **覆盖语言**:法语、西班牙语、中文 - **任务分类**: - 特定语言的语言推理(Language-specific Linguistic Reasoning) - 文字游戏与谜语 - 文化推理与传统习俗 - 带文化关联的数学推理(Math Reasoning with Cultural Relevance) - **英文对照版本**:针对文化推理与传统、数学推理两类任务,提供人工翻译的英文版本以供直接对比评测。 - **标准答案**:每道提示题均配有简短客观的标准答案,支持自动化评测。 ## 数据集结构 每条数据条目包含: - 母语语言的提示文本与标准答案(对应字段:`i18n_prompt`、`i18n_gtfa`) - (针对数学推理与文化推理类任务)附带英文对照的提示文本与标准答案(对应字段:`english_prompt`、`english_gtfa`) - 元数据:`task_id`(任务ID)、`language`(语言)、`category`(任务分类) ## 引用说明 若您在研究中使用 MultiNRC 数据集,请引用以下文献: bibtex @article{fabbri2025multinrc, title = {MultiNRC:面向大语言模型的原生多语言推理评测基准}, author = {Fabbri, Alexander R. and Mares, Diego and Flores, Jorge and Mankikar, Meher and Hernandez, Ernesto and Lee, Dean and Liu, Bing and Xing, Chen}, year = {2025}, note = {arXiv预印本,arXiv:XXXX.XXXXX} }
提供机构:
maas
创建时间:
2025-09-23
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作