five

PolyMath

收藏
魔搭社区2026-05-02 更新2025-11-03 收录
下载链接:
https://modelscope.cn/datasets/evalscope/PolyMath
下载链接
链接失效反馈
官方服务:
资源简介:
<div align="center"> <h2> PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts </h2> </div> <div align="center"> <a href="https://arxiv.org/abs/2504.18428"> <img src="https://img.shields.io/badge/arXiv-2504.18428-b31b1b.svg?logo=arxiv" alt="arXiv Badge"/> </a> <a href="https://github.com/QwenLM/PolyMath"> <img src="https://img.shields.io/badge/GitHub-Code-black?logo=github" alt="GitHub Badge"/> </a> </div> **PolyMath** is a multilingual mathematical reasoning benchmark **covering 18 languages** and **4 easy-to-hard difficulty levels**. Our benchmark ensures *difficulty comprehensiveness*, *language diversity*, and *high-quality translation*, making it a highly discriminative multilingual mathematical benchmark in the era of reasoning LLMs. - 📈 **Broad Difficulty Range:** PolyMath defines and partitions mathematical difficulty across four levels using two core dimensions: **Thought Depth** and **Knowledge Breadth**, ranging from K-12 to Olympiad and advanced frontier mathematics, with 125 problems per language at each level. <div align="center"> <img src="_ASSETS/level.png" alt="logo" width="85%"/> </div> - 🌍 **Language Diversity:** Each problem in PolyMath is available in **18 parallel language versions**, encompassing **over 75% of the world’s native speakers** and major language families, ensuring diversity across both high-resource and low-resource languages. <div align="center"> <img src="_ASSETS/language.png" alt="logo" width="50%"/> </div> - 🧑‍🏫 **High-Quality Annotation:** Each problem translation is **calibrated by language experts**, avoiding direct use of LLM-generated outputs and ensuring precise term and logical clarity. <div align="center"> <img src="_ASSETS/human.png" alt="logo" width="90%"/> </div> --- ## 📊 Main Results The leaderboard is continuously updated! See https://qwen-polymath.github.io/#leaderboard --- ## 📄 Citation If you use **PolyMath** in your research, please cite us: ```bibtex @article{wang2025polymath, title={PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts}, author={Yiming Wang and Pei Zhang and Jialong Tang and Haoran Wei and Baosong Yang and Rui Wang and Chenshu Sun and Feitong Sun and Jiran Zhang and Junxuan Wu and Qiqian Cang and Yichang Zhang and Fei Huang and Junyang Lin and Fei Huang and Jingren Zhou}, journal={arXiv preprint arXiv:2504.18428}, year={2025}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2504.18428}, } ```

<div align="center"> <h2> PolyMath:多语言场景下的数学推理评测基准 </h2> </div> <div align="center"> <a href="https://arxiv.org/abs/2504.18428"> <img src="https://img.shields.io/badge/arXiv-2504.18428-b31b1b.svg?logo=arxiv" alt="arXiv 徽章"/> </a> <a href="https://github.com/QwenLM/PolyMath"> <img src="https://img.shields.io/badge/GitHub-Code-black?logo=github" alt="GitHub 徽章"/> </a> </div> **PolyMath**是一款覆盖**18种语言**与**4档由易到难难度等级**的多语言数学推理评测基准。本基准兼顾难度覆盖全面性、语言多样性与翻译质量高标准,使其成为推理型大语言模型(Large Language Model)时代下极具区分度的多语言数学评测基准。 - 📈 **多档位难度梯度**:PolyMath通过「思维深度(Thought Depth)」与「知识广度(Knowledge Breadth)」两个核心维度,定义并划分数学试题的难度等级,共设4档,内容覆盖从K-12教育阶段、奥林匹克数学到前沿高等数学范畴,每个难度等级下每种语言均配备125道试题。 <div align="center"> <img src="_ASSETS/level.png" alt="难度等级示意图" width="85%"/> </div> - 🌍 **语言多样性覆盖**:PolyMath的每道试题均提供**18种平行语言版本**,覆盖全球超过75%的母语使用者与主要语系,兼顾高资源语言与低资源语言的多样性。 <div align="center"> <img src="_ASSETS/language.png" alt="语言分布示意图" width="50%"/> </div> - 🧑‍🏫 **高质量标注校准**:每道试题的翻译均经语言专家校准,避免直接使用大语言模型生成的译文,确保术语精准、逻辑清晰。 <div align="center"> <img src="_ASSETS/human.png" alt="人工校准示意图" width="90%"/> </div> --- ## 📊 主要实验结果 排行榜持续更新!详情请访问 https://qwen-polymath.github.io/#leaderboard --- ## 📄 引用规范 若您在研究中使用**PolyMath**,请引用如下文献: bibtex @article{wang2025polymath, title={PolyMath: Evaluating Mathematical Reasoning in Multilingual Contexts}, author={Yiming Wang and Pei Zhang and Jialong Tang and Haoran Wei and Baosong Yang and Rui Wang and Chenshu Sun and Feitong Sun and Jiran Zhang and Junxuan Wu and Qiqian Cang and Yichang Zhang and Fei Huang and Junyang Lin and Fei Huang and Jingren Zhou}, journal={arXiv preprint arXiv:2504.18428}, year={2025}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2504.18428}, }
提供机构:
maas
创建时间:
2025-10-13
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作