livebench_math

Name: livebench_math
Creator: maas
Published: 2025-11-27 16:27:57
License: 暂无描述

魔搭社区2025-11-27 更新2025-03-29 收录

下载链接：

https://modelscope.cn/datasets/AI-ModelScope/livebench_math

下载链接

链接失效反馈

官方服务：

资源简介：

# Dataset Card for "livebench/math" LiveBench is a benchmark for LLMs designed with test set contamination and objective evaluation in mind. It has the following properties: - LiveBench is designed to limit potential contamination by releasing new questions monthly, as well as having questions based on recently-released datasets, arXiv papers, news articles, and IMDb movie synopses. - Each question has verifiable, objective ground-truth answers, allowing hard questions to be scored accurately and automatically, without the use of an LLM judge. - LiveBench currently contains a set of 18 diverse tasks across 6 categories, and we will release new, harder tasks over time. This is the instruction_following category of livebench. See more in our [paper](https://arxiv.org/abs/2406.19314), [leaderboard](https://livebench.ai/), and [datasheet](https://github.com/LiveBench/LiveBench/blob/main/docs/DATASHEET.md).

# "livebench/math"数据集卡片 LiveBench是一款专为大语言模型（Large Language Model，LLM）打造的基准测试集，其设计初衷为规避测试集污染问题并实现客观评估。该基准具备以下特性： - 通过每月发布全新题目，且题目取材于近期公开的数据集、arXiv论文、新闻文章以及IMDb电影剧情简介，以此限制潜在的测试集污染风险。 - 每道题目均配有可验证的客观标准答案，无需借助大语言模型作为评判器，即可准确且自动地对高难度题目完成评分。 - 目前LiveBench共涵盖6个类别下的18项多样化任务，后续将持续发布难度更高的全新任务。本数据集为LiveBench的指令遵循子类。更多详情可参阅我们的[论文](https://arxiv.org/abs/2406.19314)、[排行榜](https://livebench.ai/)及[数据集说明文档](https://github.com/LiveBench/LiveBench/blob/main/docs/DATASHEET.md)。

提供机构：

maas

创建时间：

2025-03-28

5,000+

优质数据集

54 个

任务类型

进入经典数据集