haowu89/math-ai-bench-sources-latest
收藏Hugging Face2026-04-06 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/haowu89/math-ai-bench-sources-latest
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- text-generation
language:
- en
tags:
- math
- reasoning
- benchmark
- trajectories
size_categories:
- 1K<n<10K
---
# math-ai-bench-sources-latest
This dataset is an updated aggregated multi-trajectory benchmark built from the latest `parallelthinking_benchmark` files under `/scratch/haowu/datasets/datasets/parallelthinking_benchmark_latest`.
It follows the same high-level format as [`haowu89/math-ai-bench-sources`](https://huggingface.co/datasets/haowu89/math-ai-bench-sources), but it is a newer version with:
- updated benchmark composition
- updated model set
- aligned question coverage across all included models
## Included Models
- `Qwen2.5-1.5B-Instruct`
- `Qwen3-1.7B`
- `Qwen3-4B`
- `Qwen3-30B-A3B-Thinking-2507`
## File
- `math-ai-bench-sources-latest.jsonl`
## Data Construction
The source data is built from math-focused benchmark subsets. For each included model, each `(source, index)` question keeps exactly 8 reasoning trajectories.
This file is the aggregated version of the per-trajectory JSONL files:
- one row per `(model, source, index)`
- all 8 trajectories are packed into `generated_solutions`
## Dataset Size
- total rows: `9276`
- rows per model: `2319`
- trajectories per row: `8`
## Benchmark Sources
- `aime25`
- `aime26`
- `apex_2025`
- `arxivmath`
- `cmimc_2025`
- `gpqa_diamond`
- `hmmt_feb_2026`
- `hmmt_nov_2025`
- `imobench`
- `olympiadbench`
- `theoremqa`
## JSONL Schema
Each line is a JSON object with fields:
- `problem` (`str`): question text
- `original_solution` (`str`): reference/original solution if available
- `answer` (`str`): ground-truth answer
- `source` (`str`): benchmark subset name
- `index` (`int`): question index within the subset
- `model` (`str`): model name
- `generated_solutions` (`List[str]`): 8 reasoning trajectories for the same question
- `count` (`int`): number of trajectories, always `8`
## Notes
- Use `(source, index)` to align the same question across different models.
- This is an aggregated multi-trajectory export. The original single-trajectory fields `generated_solution` and `sample` are replaced by `generated_solutions` and `count`.
- All four included models are aligned to the same question set.
提供机构:
haowu89



