toloka/mu-math

Name: toloka/mu-math
Creator: toloka
Published: 2026-01-30 19:22:58
License: 暂无描述

Hugging Face2026-01-30 更新2024-12-14 收录

下载链接：

https://hf-mirror.com/datasets/toloka/mu-math

下载链接

链接失效反馈

官方服务：

资源简介：

μ-MATH数据集是一个用于评估大语言模型（LLMs）在判断自由形式数学解决方案方面能力的元评估数据集。该数据集包含1,084个标记样本，这些样本来自271个U-MATH任务，涵盖了不同评估复杂性的问题。数据集构建过程中，使用了四个表现优异的LLMs（Llama-3.1 70B、Qwen2.5 72B、GPT-4o、Gemini 1.5 Pro）生成的解决方案，并由数学专家和自动验证工具进行标记。数据集的主要评估指标是宏F1分数，次要指标包括真阳性率、真阴性率、阳性预测值和阴性预测值。

μ-MATH is a meta-evaluation dataset designed to assess the ability of large language models (LLMs) to judge free-form mathematical solutions. The dataset contains 1,084 labeled samples derived from 271 U-MATH tasks, covering problems of varying assessment complexity. The construction of the dataset includes solutions generated by four top-performing LLMs, which are labeled by math experts and formal auto-verification. The primary focus is on the meta-evaluation of LLMs as evaluators, testing their accuracy in judging free-form solutions. The primary evaluation metric is the Macro F1-score, with secondary metrics including True Positive Rate, True Negative Rate, Positive Predictive Value, and Negative Predictive Value.

提供机构：

toloka

5,000+

优质数据集

54 个

任务类型

进入经典数据集