haowu89/math-ai-bench-sources-latest

Name: haowu89/math-ai-bench-sources-latest
Creator: haowu89
Published: 2026-04-06 21:29:25
License: 暂无描述

Hugging Face2026-04-06 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/haowu89/math-ai-bench-sources-latest

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit task_categories: - text-generation language: - en tags: - math - reasoning - benchmark - trajectories size_categories: - 1K<n<10K --- # math-ai-bench-sources-latest This dataset is an updated aggregated multi-trajectory benchmark built from the latest `parallelthinking_benchmark` files under `/scratch/haowu/datasets/datasets/parallelthinking_benchmark_latest`. It follows the same high-level format as [`haowu89/math-ai-bench-sources`](https://huggingface.co/datasets/haowu89/math-ai-bench-sources), but it is a newer version with: - updated benchmark composition - updated model set - aligned question coverage across all included models ## Included Models - `Qwen2.5-1.5B-Instruct` - `Qwen3-1.7B` - `Qwen3-4B` - `Qwen3-30B-A3B-Thinking-2507` ## File - `math-ai-bench-sources-latest.jsonl` ## Data Construction The source data is built from math-focused benchmark subsets. For each included model, each `(source, index)` question keeps exactly 8 reasoning trajectories. This file is the aggregated version of the per-trajectory JSONL files: - one row per `(model, source, index)` - all 8 trajectories are packed into `generated_solutions` ## Dataset Size - total rows: `9276` - rows per model: `2319` - trajectories per row: `8` ## Benchmark Sources - `aime25` - `aime26` - `apex_2025` - `arxivmath` - `cmimc_2025` - `gpqa_diamond` - `hmmt_feb_2026` - `hmmt_nov_2025` - `imobench` - `olympiadbench` - `theoremqa` ## JSONL Schema Each line is a JSON object with fields: - `problem` (`str`): question text - `original_solution` (`str`): reference/original solution if available - `answer` (`str`): ground-truth answer - `source` (`str`): benchmark subset name - `index` (`int`): question index within the subset - `model` (`str`): model name - `generated_solutions` (`List[str]`): 8 reasoning trajectories for the same question - `count` (`int`): number of trajectories, always `8` ## Notes - Use `(source, index)` to align the same question across different models. - This is an aggregated multi-trajectory export. The original single-trajectory fields `generated_solution` and `sample` are replaced by `generated_solutions` and `count`. - All four included models are aligned to the same question set.

提供机构：

haowu89

5,000+

优质数据集

54 个

任务类型

进入经典数据集