five

haowu89/math-ai-bench-sources-latest

收藏
Hugging Face2026-04-06 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/haowu89/math-ai-bench-sources-latest
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - text-generation language: - en tags: - math - reasoning - benchmark - trajectories size_categories: - 1K<n<10K --- # math-ai-bench-sources-latest This dataset is an updated aggregated multi-trajectory benchmark built from the latest `parallelthinking_benchmark` files under `/scratch/haowu/datasets/datasets/parallelthinking_benchmark_latest`. It follows the same high-level format as [`haowu89/math-ai-bench-sources`](https://huggingface.co/datasets/haowu89/math-ai-bench-sources), but it is a newer version with: - updated benchmark composition - updated model set - aligned question coverage across all included models ## Included Models - `Qwen2.5-1.5B-Instruct` - `Qwen3-1.7B` - `Qwen3-4B` - `Qwen3-30B-A3B-Thinking-2507` ## File - `math-ai-bench-sources-latest.jsonl` ## Data Construction The source data is built from math-focused benchmark subsets. For each included model, each `(source, index)` question keeps exactly 8 reasoning trajectories. This file is the aggregated version of the per-trajectory JSONL files: - one row per `(model, source, index)` - all 8 trajectories are packed into `generated_solutions` ## Dataset Size - total rows: `9276` - rows per model: `2319` - trajectories per row: `8` ## Benchmark Sources - `aime25` - `aime26` - `apex_2025` - `arxivmath` - `cmimc_2025` - `gpqa_diamond` - `hmmt_feb_2026` - `hmmt_nov_2025` - `imobench` - `olympiadbench` - `theoremqa` ## JSONL Schema Each line is a JSON object with fields: - `problem` (`str`): question text - `original_solution` (`str`): reference/original solution if available - `answer` (`str`): ground-truth answer - `source` (`str`): benchmark subset name - `index` (`int`): question index within the subset - `model` (`str`): model name - `generated_solutions` (`List[str]`): 8 reasoning trajectories for the same question - `count` (`int`): number of trajectories, always `8` ## Notes - Use `(source, index)` to align the same question across different models. - This is an aggregated multi-trajectory export. The original single-trajectory fields `generated_solution` and `sample` are replaced by `generated_solutions` and `count`. - All four included models are aligned to the same question set.
提供机构:
haowu89
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作