LiveMathBench
收藏魔搭社区2026-01-06 更新2025-01-25 收录
下载链接:
https://modelscope.cn/datasets/opencompass/LiveMathBench
下载链接
链接失效反馈官方服务:
资源简介:
# Dataset Card for "LiveMathBench"
- **Homepage:** [https://open-compass.github.io/GPassK/](https://open-compass.github.io/GPassK/)
- **Repository:** [https://github.com/open-compass/GPassK](https://github.com/open-compass/GPassK)
- **Paper:** [Are Your LLMs Capable of Stable Reasoning?](https://arxiv.org/abs/2412.13147)
## Introduction
LiveMathBench is a mathematical dataset, specifically designed to include challenging latest question sets from various mathematical competitions, aiming to avoid data contamination issues in existing LLMs and public math benchmarks.
## Leaderboard
The Latest leaderboard is provided in our [leaderboard](https://open-compass.github.io/GPassK/).
## Data
### v202412
The 202412 version of LiveMathBench contains 238 mathematical questions from the China National Mathematical Olympiad (CNMO), the China’s College Entrance Examination (CCEE), the American Mathematics Competition (AMC), and the William Lowell Putnam Mathematical Competition (WLPMC).
Here is an example:
```
question: A sequence $y_1,y_2,\dots,y_k$ of real numbers is called \emph{zigzag} if $k=1$, or if $y_2-y_1, y_3-y_2, \dots, y_k-y_{k-1}$ are nonzero and alternate in sign. Let $X_1,X_2,\dots,X_n$ be chosen independently from the uniform distribution on $[0,1]$. Let $a(X_1,X_2,\dots,X_n)$ be the largest value of $k$ for which there exists an increasing sequence of integers $i_1,i_2,\\dots,i_k$ such that $X_{i_1},X_{i_2},\dots,X_{i_k}$ is zigzag. Find the expected value of $a(X_1,X_2,\dots,X_n)$ for $n \geq 2$.
answer: $\frac{2n+2}{3}$
question_type: Problem-Solving
```
### v202505
The 202505 version of LiveMathBench contains 100 mathematical questions from various countries and non-English questions.
Citation:
```
@article{liu2024your,
title={Are Your LLMs Capable of Stable Reasoning?},
author={Liu, Junnan and Liu, Hongwei and Xiao, Linchen and Wang, Ziyi and Liu, Kuikun and Gao, Songyang and Zhang, Wenwei and Zhang, Songyang and Chen, Kai},
journal={arXiv preprint arXiv:2412.13147},
year={2024}
}
```
# 「LiveMathBench」数据集卡片
- **官方主页:** [https://open-compass.github.io/GPassK/](https://open-compass.github.io/GPassK/)
- **代码仓库:** [https://github.com/open-compass/GPassK](https://github.com/open-compass/GPassK/)
- **相关论文:** [Are Your LLMs Capable of Stable Reasoning?](https://arxiv.org/abs/2412.13147)
## 简介
LiveMathBench是一款专为收录各类数学竞赛最新高难度试题打造的数学数据集,旨在规避现有大语言模型(Large Language Model,简称LLM)与公开数学基准数据集存在的数据污染问题。
## 排行榜
最新排行榜可在我们的[排行榜页面](https://open-compass.github.io/GPassK/)查看。
## 数据集详情
### v202412版本
v202412版本的LiveMathBench共包含238道数学试题,题目来源涵盖中国全国数学奥林匹克竞赛(China National Mathematical Olympiad,缩写CNMO)、中国普通高等学校招生全国统一考试(College Entrance Examination,缩写CCEE)、美国数学竞赛(American Mathematics Competition,缩写AMC)以及威廉·洛厄尔·普特南数学竞赛(William Lowell Putnam Mathematical Competition,缩写WLPMC)。
以下为一道示例试题:
题目:一个实数序列 $y_1,y_2,dots,y_k$ 被称为**之字形序列**,若 $k=1$,或 $y_2-y_1, y_3-y_2, dots, y_k-y_{k-1}$ 均非零且符号交替。设 $X_1,X_2,dots,X_n$ 为从 $[0,1]$ 上的均匀分布独立选取的随机变量。令 $a(X_1,X_2,dots,X_n)$ 为满足存在整数递增序列 $i_1,i_2,dots,i_k$ 使得 $X_{i_1},X_{i_2},dots,X_{i_k}$ 为之字形序列的最大 $k$ 值。求当 $n geq 2$ 时 $a(X_1,X_2,dots,X_n)$ 的数学期望。
答案:$frac{2n+2}{3}$
题目类型:问题求解(Problem-Solving)
### v202505版本
v202505版本的LiveMathBench共包含100道数学试题,题目来自多个国家,且包含非英文试题。
## 引用格式
@article{liu2024your,
title={Are Your LLMs Capable of Stable Reasoning?},
author={Liu, Junnan and Liu, Hongwei and Xiao, Linchen and Wang, Ziyi and Liu, Kuikun and Gao, Songyang and Zhang, Wenwei and Zhang, Songyang and Chen, Kai},
journal={arXiv preprint arXiv:2412.13147},
year={2024}
}
提供机构:
maas
创建时间:
2025-01-23



