sibasmarakp/Qwen2.5-Math-7B-Instruct-math-shepherd-mistral-7b-prm-best_of_n-completions
收藏Hugging Face2026-03-28 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/sibasmarakp/Qwen2.5-Math-7B-Instruct-math-shepherd-mistral-7b-prm-best_of_n-completions
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: MATH500--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last
features:
- name: problem
dtype: string
- name: solution
dtype: string
- name: answer
dtype: string
- name: subject
dtype: string
- name: level
dtype: int64
- name: unique_id
dtype: string
- name: completions
list: string
- name: scores
list:
list: float64
- name: pred
dtype: string
- name: completion_tokens
list: int64
- name: agg_scores
list: float64
- name: pred_weighted@1
dtype: string
- name: pred_maj@1
dtype: string
- name: pred_naive@1
dtype: string
- name: pred_weighted@2
dtype: string
- name: pred_maj@2
dtype: string
- name: pred_naive@2
dtype: string
- name: pred_weighted@4
dtype: string
- name: pred_maj@4
dtype: string
- name: pred_naive@4
dtype: string
- name: pred_weighted@8
dtype: string
- name: pred_maj@8
dtype: string
- name: pred_naive@8
dtype: string
splits:
- name: train
num_bytes: 8635020
num_examples: 500
download_size: 7944661
dataset_size: 8635020
- config_name: MATH500--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last--evals
features:
- name: n
dtype: int64
- name: acc_naive
dtype: float64
- name: acc_weighted
dtype: float64
- name: acc_maj
dtype: float64
splits:
- name: train
num_bytes: 96
num_examples: 3
download_size: 2196
dataset_size: 96
- config_name: MATH500--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last
features:
- name: problem
dtype: string
- name: solution
dtype: string
- name: answer
dtype: string
- name: subject
dtype: string
- name: level
dtype: int64
- name: unique_id
dtype: string
- name: completions
list: string
- name: scores
list:
list: float64
- name: pred
dtype: string
- name: completion_tokens
list: int64
- name: agg_scores
list: float64
- name: pred_weighted@1
dtype: string
- name: pred_maj@1
dtype: string
- name: pred_naive@1
dtype: string
- name: pred_weighted@2
dtype: string
- name: pred_maj@2
dtype: string
- name: pred_naive@2
dtype: string
- name: pred_weighted@4
dtype: string
- name: pred_maj@4
dtype: string
- name: pred_naive@4
dtype: string
- name: pred_weighted@8
dtype: string
- name: pred_maj@8
dtype: string
- name: pred_naive@8
dtype: string
splits:
- name: train
num_bytes: 8789364
num_examples: 500
download_size: 8093954
dataset_size: 8789364
- config_name: MATH500--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last--evals
features:
- name: n
dtype: int64
- name: acc_naive
dtype: float64
- name: acc_weighted
dtype: float64
- name: acc_maj
dtype: float64
splits:
- name: train
num_bytes: 128
num_examples: 4
download_size: 2209
dataset_size: 128
- config_name: MATH500--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last
features:
- name: problem
dtype: string
- name: solution
dtype: string
- name: answer
dtype: string
- name: subject
dtype: string
- name: level
dtype: int64
- name: unique_id
dtype: string
- name: completions
list: string
- name: scores
list:
list: float64
- name: pred
dtype: string
- name: completion_tokens
list: int64
- name: agg_scores
list: float64
- name: pred_weighted@1
dtype: string
- name: pred_maj@1
dtype: string
- name: pred_naive@1
dtype: string
- name: pred_weighted@2
dtype: string
- name: pred_maj@2
dtype: string
- name: pred_naive@2
dtype: string
- name: pred_weighted@4
dtype: string
- name: pred_maj@4
dtype: string
- name: pred_naive@4
dtype: string
- name: pred_weighted@8
dtype: string
- name: pred_maj@8
dtype: string
- name: pred_naive@8
dtype: string
splits:
- name: train
num_bytes: 8629571
num_examples: 500
download_size: 7934752
dataset_size: 8629571
- config_name: MATH500--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last--evals
features:
- name: n
dtype: int64
- name: acc_naive
dtype: float64
- name: acc_weighted
dtype: float64
- name: acc_maj
dtype: float64
splits:
- name: train
num_bytes: 128
num_examples: 4
download_size: 2214
dataset_size: 128
- config_name: OlympiadBench--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last
features:
- name: id
dtype: int64
- name: problem
dtype: string
- name: solution
list: string
- name: answer
list: string
- name: context
dtype: 'null'
- name: image_1
dtype: 'null'
- name: image_2
dtype: 'null'
- name: image_3
dtype: 'null'
- name: image_4
dtype: 'null'
- name: image_5
dtype: 'null'
- name: image_6
dtype: 'null'
- name: image_7
dtype: 'null'
- name: image_8
dtype: 'null'
- name: image_9
dtype: 'null'
- name: modality
dtype: string
- name: difficulty
dtype: string
- name: is_multiple_answer
dtype: bool
- name: unit
dtype: string
- name: answer_type
dtype: string
- name: error
dtype: string
- name: question_type
dtype: string
- name: subfield
dtype: string
- name: subject
dtype: string
- name: language
dtype: string
- name: completions
list: string
- name: scores
list:
list: float64
- name: pred
dtype: string
- name: completion_tokens
list: int64
- name: agg_scores
list: float64
- name: pred_weighted@1
dtype: string
- name: pred_maj@1
dtype: string
- name: pred_naive@1
dtype: string
- name: pred_weighted@2
dtype: string
- name: pred_maj@2
dtype: string
- name: pred_naive@2
dtype: string
- name: pred_weighted@4
dtype: string
- name: pred_maj@4
dtype: string
- name: pred_naive@4
dtype: string
- name: pred_weighted@8
dtype: string
- name: pred_maj@8
dtype: string
- name: pred_naive@8
dtype: string
splits:
- name: train
num_bytes: 18225949
num_examples: 674
download_size: 17202127
dataset_size: 18225949
- config_name: OlympiadBench--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last--evals
features:
- name: n
dtype: int64
- name: acc_naive
dtype: float64
- name: acc_weighted
dtype: float64
- name: acc_maj
dtype: float64
splits:
- name: train
num_bytes: 128
num_examples: 4
download_size: 2231
dataset_size: 128
- config_name: OlympiadBench--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last
features:
- name: id
dtype: int64
- name: problem
dtype: string
- name: solution
list: string
- name: answer
list: string
- name: context
dtype: 'null'
- name: image_1
dtype: 'null'
- name: image_2
dtype: 'null'
- name: image_3
dtype: 'null'
- name: image_4
dtype: 'null'
- name: image_5
dtype: 'null'
- name: image_6
dtype: 'null'
- name: image_7
dtype: 'null'
- name: image_8
dtype: 'null'
- name: image_9
dtype: 'null'
- name: modality
dtype: string
- name: difficulty
dtype: string
- name: is_multiple_answer
dtype: bool
- name: unit
dtype: string
- name: answer_type
dtype: string
- name: error
dtype: string
- name: question_type
dtype: string
- name: subfield
dtype: string
- name: subject
dtype: string
- name: language
dtype: string
- name: completions
list: string
- name: scores
list:
list: float64
- name: pred
dtype: string
- name: completion_tokens
list: int64
- name: agg_scores
list: float64
- name: pred_weighted@1
dtype: string
- name: pred_maj@1
dtype: string
- name: pred_naive@1
dtype: string
- name: pred_weighted@2
dtype: string
- name: pred_maj@2
dtype: string
- name: pred_naive@2
dtype: string
- name: pred_weighted@4
dtype: string
- name: pred_maj@4
dtype: string
- name: pred_naive@4
dtype: string
- name: pred_weighted@8
dtype: string
- name: pred_maj@8
dtype: string
- name: pred_naive@8
dtype: string
splits:
- name: train
num_bytes: 18231149
num_examples: 674
download_size: 17123549
dataset_size: 18231149
- config_name: OlympiadBench--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last--evals
features:
- name: n
dtype: int64
- name: acc_naive
dtype: float64
- name: acc_weighted
dtype: float64
- name: acc_maj
dtype: float64
splits:
- name: train
num_bytes: 128
num_examples: 4
download_size: 2231
dataset_size: 128
- config_name: OlympiadBench--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last
features:
- name: id
dtype: int64
- name: problem
dtype: string
- name: solution
list: string
- name: answer
list: string
- name: context
dtype: 'null'
- name: image_1
dtype: 'null'
- name: image_2
dtype: 'null'
- name: image_3
dtype: 'null'
- name: image_4
dtype: 'null'
- name: image_5
dtype: 'null'
- name: image_6
dtype: 'null'
- name: image_7
dtype: 'null'
- name: image_8
dtype: 'null'
- name: image_9
dtype: 'null'
- name: modality
dtype: string
- name: difficulty
dtype: string
- name: is_multiple_answer
dtype: bool
- name: unit
dtype: string
- name: answer_type
dtype: string
- name: error
dtype: string
- name: question_type
dtype: string
- name: subfield
dtype: string
- name: subject
dtype: string
- name: language
dtype: string
- name: completions
list: string
- name: scores
list:
list: float64
- name: pred
dtype: string
- name: completion_tokens
list: int64
- name: agg_scores
list: float64
- name: pred_weighted@1
dtype: string
- name: pred_maj@1
dtype: string
- name: pred_naive@1
dtype: string
- name: pred_weighted@2
dtype: string
- name: pred_maj@2
dtype: string
- name: pred_naive@2
dtype: string
- name: pred_weighted@4
dtype: string
- name: pred_maj@4
dtype: string
- name: pred_naive@4
dtype: string
- name: pred_weighted@8
dtype: string
- name: pred_maj@8
dtype: string
- name: pred_naive@8
dtype: string
splits:
- name: train
num_bytes: 18438815
num_examples: 674
download_size: 17373912
dataset_size: 18438815
- config_name: OlympiadBench--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last--evals
features:
- name: n
dtype: int64
- name: acc_naive
dtype: float64
- name: acc_weighted
dtype: float64
- name: acc_maj
dtype: float64
splits:
- name: train
num_bytes: 128
num_examples: 4
download_size: 2231
dataset_size: 128
- config_name: minervamath--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last
features:
- name: problem
dtype: string
- name: answer
dtype: string
- name: completions
list: string
- name: scores
list:
list: float64
- name: pred
dtype: string
- name: completion_tokens
list: int64
- name: agg_scores
list: float64
- name: pred_weighted@1
dtype: string
- name: pred_maj@1
dtype: string
- name: pred_naive@1
dtype: string
- name: pred_weighted@2
dtype: string
- name: pred_maj@2
dtype: string
- name: pred_naive@2
dtype: string
- name: pred_weighted@4
dtype: string
- name: pred_maj@4
dtype: string
- name: pred_naive@4
dtype: string
- name: pred_weighted@8
dtype: string
- name: pred_maj@8
dtype: string
- name: pred_naive@8
dtype: string
splits:
- name: train
num_bytes: 5577220
num_examples: 272
download_size: 5239802
dataset_size: 5577220
- config_name: minervamath--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last--evals
features:
- name: n
dtype: int64
- name: acc_naive
dtype: float64
- name: acc_weighted
dtype: float64
- name: acc_maj
dtype: float64
splits:
- name: train
num_bytes: 128
num_examples: 4
download_size: 2219
dataset_size: 128
- config_name: minervamath--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last
features:
- name: problem
dtype: string
- name: answer
dtype: string
- name: completions
list: string
- name: scores
list:
list: float64
- name: pred
dtype: string
- name: completion_tokens
list: int64
- name: agg_scores
list: float64
- name: pred_weighted@1
dtype: string
- name: pred_maj@1
dtype: string
- name: pred_naive@1
dtype: string
- name: pred_weighted@2
dtype: string
- name: pred_maj@2
dtype: string
- name: pred_naive@2
dtype: string
- name: pred_weighted@4
dtype: string
- name: pred_maj@4
dtype: string
- name: pred_naive@4
dtype: string
- name: pred_weighted@8
dtype: string
- name: pred_maj@8
dtype: string
- name: pred_naive@8
dtype: string
splits:
- name: train
num_bytes: 5563411
num_examples: 272
download_size: 5209063
dataset_size: 5563411
- config_name: minervamath--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last--evals
features:
- name: n
dtype: int64
- name: acc_naive
dtype: float64
- name: acc_weighted
dtype: float64
- name: acc_maj
dtype: float64
splits:
- name: train
num_bytes: 128
num_examples: 4
download_size: 2233
dataset_size: 128
- config_name: minervamath--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last
features:
- name: problem
dtype: string
- name: answer
dtype: string
- name: completions
list: string
- name: scores
list:
list: float64
- name: pred
dtype: string
- name: completion_tokens
list: int64
- name: agg_scores
list: float64
- name: pred_weighted@1
dtype: string
- name: pred_maj@1
dtype: string
- name: pred_naive@1
dtype: string
- name: pred_weighted@2
dtype: string
- name: pred_maj@2
dtype: string
- name: pred_naive@2
dtype: string
- name: pred_weighted@4
dtype: string
- name: pred_maj@4
dtype: string
- name: pred_naive@4
dtype: string
- name: pred_weighted@8
dtype: string
- name: pred_maj@8
dtype: string
- name: pred_naive@8
dtype: string
splits:
- name: train
num_bytes: 5530198
num_examples: 272
download_size: 5156379
dataset_size: 5530198
- config_name: minervamath--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last--evals
features:
- name: n
dtype: int64
- name: acc_naive
dtype: float64
- name: acc_weighted
dtype: float64
- name: acc_maj
dtype: float64
splits:
- name: train
num_bytes: 128
num_examples: 4
download_size: 2227
dataset_size: 128
- config_name: rebuttal-MATH500--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last
features:
- name: problem
dtype: string
- name: solution
dtype: string
- name: answer
dtype: string
- name: subject
dtype: string
- name: level
dtype: int64
- name: unique_id
dtype: string
- name: completions
list: string
- name: scores
list:
list: float64
- name: pred
dtype: string
- name: completion_tokens
list: int64
- name: agg_scores
list: float64
- name: pred_weighted@1
dtype: string
- name: pred_maj@1
dtype: string
- name: pred_naive@1
dtype: string
- name: pred_weighted@2
dtype: string
- name: pred_maj@2
dtype: string
- name: pred_naive@2
dtype: string
- name: pred_weighted@4
dtype: string
- name: pred_maj@4
dtype: string
- name: pred_naive@4
dtype: string
- name: pred_weighted@8
dtype: string
- name: pred_maj@8
dtype: string
- name: pred_naive@8
dtype: string
splits:
- name: train
num_bytes: 8692672
num_examples: 500
download_size: 8001245
dataset_size: 8692672
- config_name: rebuttal-MATH500--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last--evals
features:
- name: n
dtype: int64
- name: acc_naive
dtype: float64
- name: acc_weighted
dtype: float64
- name: acc_maj
dtype: float64
splits:
- name: train
num_bytes: 128
num_examples: 4
download_size: 2218
dataset_size: 128
- config_name: rebuttal-MATH500--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last
features:
- name: problem
dtype: string
- name: solution
dtype: string
- name: answer
dtype: string
- name: subject
dtype: string
- name: level
dtype: int64
- name: unique_id
dtype: string
- name: completions
list: string
- name: scores
list:
list: float64
- name: pred
dtype: string
- name: completion_tokens
list: int64
- name: agg_scores
list: float64
- name: pred_weighted@1
dtype: string
- name: pred_maj@1
dtype: string
- name: pred_naive@1
dtype: string
- name: pred_weighted@2
dtype: string
- name: pred_maj@2
dtype: string
- name: pred_naive@2
dtype: string
- name: pred_weighted@4
dtype: string
- name: pred_maj@4
dtype: string
- name: pred_naive@4
dtype: string
- name: pred_weighted@8
dtype: string
- name: pred_maj@8
dtype: string
- name: pred_naive@8
dtype: string
splits:
- name: train
num_bytes: 8737738
num_examples: 500
download_size: 8041960
dataset_size: 8737738
- config_name: rebuttal-MATH500--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last--evals
features:
- name: n
dtype: int64
- name: acc_naive
dtype: float64
- name: acc_weighted
dtype: float64
- name: acc_maj
dtype: float64
splits:
- name: train
num_bytes: 128
num_examples: 4
download_size: 2210
dataset_size: 128
- config_name: rebuttal-MATH500--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last
features:
- name: problem
dtype: string
- name: solution
dtype: string
- name: answer
dtype: string
- name: subject
dtype: string
- name: level
dtype: int64
- name: unique_id
dtype: string
- name: completions
list: string
- name: scores
list:
list: float64
- name: pred
dtype: string
- name: completion_tokens
list: int64
- name: agg_scores
list: float64
- name: pred_weighted@1
dtype: string
- name: pred_maj@1
dtype: string
- name: pred_naive@1
dtype: string
- name: pred_weighted@2
dtype: string
- name: pred_maj@2
dtype: string
- name: pred_naive@2
dtype: string
- name: pred_weighted@4
dtype: string
- name: pred_maj@4
dtype: string
- name: pred_naive@4
dtype: string
- name: pred_weighted@8
dtype: string
- name: pred_maj@8
dtype: string
- name: pred_naive@8
dtype: string
splits:
- name: train
num_bytes: 8632821
num_examples: 500
download_size: 7938602
dataset_size: 8632821
- config_name: rebuttal-MATH500--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last--evals
features:
- name: n
dtype: int64
- name: acc_naive
dtype: float64
- name: acc_weighted
dtype: float64
- name: acc_maj
dtype: float64
splits:
- name: train
num_bytes: 128
num_examples: 4
download_size: 2219
dataset_size: 128
- config_name: rebuttal-OlympiadBench--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last
features:
- name: id
dtype: int64
- name: problem
dtype: string
- name: solution
list: string
- name: answer
list: string
- name: context
dtype: 'null'
- name: image_1
dtype: 'null'
- name: image_2
dtype: 'null'
- name: image_3
dtype: 'null'
- name: image_4
dtype: 'null'
- name: image_5
dtype: 'null'
- name: image_6
dtype: 'null'
- name: image_7
dtype: 'null'
- name: image_8
dtype: 'null'
- name: image_9
dtype: 'null'
- name: modality
dtype: string
- name: difficulty
dtype: string
- name: is_multiple_answer
dtype: bool
- name: unit
dtype: string
- name: answer_type
dtype: string
- name: error
dtype: string
- name: question_type
dtype: string
- name: subfield
dtype: string
- name: subject
dtype: string
- name: language
dtype: string
- name: completions
list: string
- name: scores
list:
list: float64
- name: pred
dtype: string
- name: completion_tokens
list: int64
- name: agg_scores
list: float64
- name: pred_weighted@1
dtype: string
- name: pred_maj@1
dtype: string
- name: pred_naive@1
dtype: string
- name: pred_weighted@2
dtype: string
- name: pred_maj@2
dtype: string
- name: pred_naive@2
dtype: string
- name: pred_weighted@4
dtype: string
- name: pred_maj@4
dtype: string
- name: pred_naive@4
dtype: string
- name: pred_weighted@8
dtype: string
- name: pred_maj@8
dtype: string
- name: pred_naive@8
dtype: string
splits:
- name: train
num_bytes: 18297133
num_examples: 674
download_size: 34354833
dataset_size: 18297133
- config_name: rebuttal-OlympiadBench--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last--evals
features:
- name: n
dtype: int64
- name: acc_naive
dtype: float64
- name: acc_weighted
dtype: float64
- name: acc_maj
dtype: float64
splits:
- name: train
num_bytes: 128
num_examples: 4
download_size: 2231
dataset_size: 128
- config_name: rebuttal-OlympiadBench--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last
features:
- name: id
dtype: int64
- name: problem
dtype: string
- name: solution
list: string
- name: answer
list: string
- name: context
dtype: 'null'
- name: image_1
dtype: 'null'
- name: image_2
dtype: 'null'
- name: image_3
dtype: 'null'
- name: image_4
dtype: 'null'
- name: image_5
dtype: 'null'
- name: image_6
dtype: 'null'
- name: image_7
dtype: 'null'
- name: image_8
dtype: 'null'
- name: image_9
dtype: 'null'
- name: modality
dtype: string
- name: difficulty
dtype: string
- name: is_multiple_answer
dtype: bool
- name: unit
dtype: string
- name: answer_type
dtype: string
- name: error
dtype: string
- name: question_type
dtype: string
- name: subfield
dtype: string
- name: subject
dtype: string
- name: language
dtype: string
- name: completions
list: string
- name: scores
list:
list: float64
- name: pred
dtype: string
- name: completion_tokens
list: int64
- name: agg_scores
list: float64
- name: pred_weighted@1
dtype: string
- name: pred_maj@1
dtype: string
- name: pred_naive@1
dtype: string
- name: pred_weighted@2
dtype: string
- name: pred_maj@2
dtype: string
- name: pred_naive@2
dtype: string
- name: pred_weighted@4
dtype: string
- name: pred_maj@4
dtype: string
- name: pred_naive@4
dtype: string
- name: pred_weighted@8
dtype: string
- name: pred_maj@8
dtype: string
- name: pred_naive@8
dtype: string
splits:
- name: train
num_bytes: 18223927
num_examples: 674
download_size: 34218810
dataset_size: 18223927
- config_name: rebuttal-OlympiadBench--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last--evals
features:
- name: n
dtype: int64
- name: acc_naive
dtype: float64
- name: acc_weighted
dtype: float64
- name: acc_maj
dtype: float64
splits:
- name: train
num_bytes: 128
num_examples: 4
download_size: 2223
dataset_size: 128
- config_name: rebuttal-OlympiadBench--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last
features:
- name: id
dtype: int64
- name: problem
dtype: string
- name: solution
list: string
- name: answer
list: string
- name: context
dtype: 'null'
- name: image_1
dtype: 'null'
- name: image_2
dtype: 'null'
- name: image_3
dtype: 'null'
- name: image_4
dtype: 'null'
- name: image_5
dtype: 'null'
- name: image_6
dtype: 'null'
- name: image_7
dtype: 'null'
- name: image_8
dtype: 'null'
- name: image_9
dtype: 'null'
- name: modality
dtype: string
- name: difficulty
dtype: string
- name: is_multiple_answer
dtype: bool
- name: unit
dtype: string
- name: answer_type
dtype: string
- name: error
dtype: string
- name: question_type
dtype: string
- name: subfield
dtype: string
- name: subject
dtype: string
- name: language
dtype: string
- name: completions
list: string
- name: scores
list:
list: float64
- name: pred
dtype: string
- name: completion_tokens
list: int64
- name: agg_scores
list: float64
- name: pred_weighted@1
dtype: string
- name: pred_maj@1
dtype: string
- name: pred_naive@1
dtype: string
- name: pred_weighted@2
dtype: string
- name: pred_maj@2
dtype: string
- name: pred_naive@2
dtype: string
- name: pred_weighted@4
dtype: string
- name: pred_maj@4
dtype: string
- name: pred_naive@4
dtype: string
- name: pred_weighted@8
dtype: string
- name: pred_maj@8
dtype: string
- name: pred_naive@8
dtype: string
splits:
- name: train
num_bytes: 18419395
num_examples: 674
download_size: 34706247
dataset_size: 18419395
- config_name: rebuttal-OlympiadBench--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last--evals
features:
- name: n
dtype: int64
- name: acc_naive
dtype: float64
- name: acc_weighted
dtype: float64
- name: acc_maj
dtype: float64
splits:
- name: train
num_bytes: 128
num_examples: 4
download_size: 2231
dataset_size: 128
- config_name: rebuttal-minervamath--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last
features:
- name: problem
dtype: string
- name: answer
dtype: string
- name: completions
list: string
- name: scores
list:
list: float64
- name: pred
dtype: string
- name: completion_tokens
list: int64
- name: agg_scores
list: float64
- name: pred_weighted@1
dtype: string
- name: pred_maj@1
dtype: string
- name: pred_naive@1
dtype: string
- name: pred_weighted@2
dtype: string
- name: pred_maj@2
dtype: string
- name: pred_naive@2
dtype: string
- name: pred_weighted@4
dtype: string
- name: pred_maj@4
dtype: string
- name: pred_naive@4
dtype: string
- name: pred_weighted@8
dtype: string
- name: pred_maj@8
dtype: string
- name: pred_naive@8
dtype: string
splits:
- name: train
num_bytes: 5577220
num_examples: 272
download_size: 5239802
dataset_size: 5577220
- config_name: rebuttal-minervamath--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last--evals
features:
- name: n
dtype: int64
- name: acc_naive
dtype: float64
- name: acc_weighted
dtype: float64
- name: acc_maj
dtype: float64
splits:
- name: train
num_bytes: 128
num_examples: 4
download_size: 2224
dataset_size: 128
- config_name: rebuttal-minervamath--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last
features:
- name: problem
dtype: string
- name: answer
dtype: string
- name: completions
list: string
- name: scores
list:
list: float64
- name: pred
dtype: string
- name: completion_tokens
list: int64
- name: agg_scores
list: float64
- name: pred_weighted@1
dtype: string
- name: pred_maj@1
dtype: string
- name: pred_naive@1
dtype: string
- name: pred_weighted@2
dtype: string
- name: pred_maj@2
dtype: string
- name: pred_naive@2
dtype: string
- name: pred_weighted@4
dtype: string
- name: pred_maj@4
dtype: string
- name: pred_naive@4
dtype: string
- name: pred_weighted@8
dtype: string
- name: pred_maj@8
dtype: string
- name: pred_naive@8
dtype: string
splits:
- name: train
num_bytes: 5563411
num_examples: 272
download_size: 5209063
dataset_size: 5563411
- config_name: rebuttal-minervamath--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last--evals
features:
- name: n
dtype: int64
- name: acc_naive
dtype: float64
- name: acc_weighted
dtype: float64
- name: acc_maj
dtype: float64
splits:
- name: train
num_bytes: 128
num_examples: 4
download_size: 2217
dataset_size: 128
- config_name: rebuttal-minervamath--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last
features:
- name: problem
dtype: string
- name: answer
dtype: string
- name: completions
list: string
- name: scores
list:
list: float64
- name: pred
dtype: string
- name: completion_tokens
list: int64
- name: agg_scores
list: float64
- name: pred_weighted@1
dtype: string
- name: pred_maj@1
dtype: string
- name: pred_naive@1
dtype: string
- name: pred_weighted@2
dtype: string
- name: pred_maj@2
dtype: string
- name: pred_naive@2
dtype: string
- name: pred_weighted@4
dtype: string
- name: pred_maj@4
dtype: string
- name: pred_naive@4
dtype: string
- name: pred_weighted@8
dtype: string
- name: pred_maj@8
dtype: string
- name: pred_naive@8
dtype: string
splits:
- name: train
num_bytes: 5530198
num_examples: 272
download_size: 5156379
dataset_size: 5530198
- config_name: rebuttal-minervamath--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last--evals
features:
- name: n
dtype: int64
- name: acc_naive
dtype: float64
- name: acc_weighted
dtype: float64
- name: acc_maj
dtype: float64
splits:
- name: train
num_bytes: 128
num_examples: 4
download_size: 2235
dataset_size: 128
configs:
- config_name: MATH500--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last
data_files:
- split: train
path: MATH500--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last/train-*
- config_name: MATH500--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last--evals
data_files:
- split: train
path: MATH500--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last--evals/train-*
- config_name: MATH500--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last
data_files:
- split: train
path: MATH500--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last/train-*
- config_name: MATH500--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last--evals
data_files:
- split: train
path: MATH500--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last--evals/train-*
- config_name: MATH500--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last
data_files:
- split: train
path: MATH500--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last/train-*
- config_name: MATH500--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last--evals
data_files:
- split: train
path: MATH500--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last--evals/train-*
- config_name: OlympiadBench--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last
data_files:
- split: train
path: OlympiadBench--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last/train-*
- config_name: OlympiadBench--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last--evals
data_files:
- split: train
path: OlympiadBench--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last--evals/train-*
- config_name: OlympiadBench--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last
data_files:
- split: train
path: OlympiadBench--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last/train-*
- config_name: OlympiadBench--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last--evals
data_files:
- split: train
path: OlympiadBench--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last--evals/train-*
- config_name: OlympiadBench--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last
data_files:
- split: train
path: OlympiadBench--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last/train-*
- config_name: OlympiadBench--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last--evals
data_files:
- split: train
path: OlympiadBench--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last--evals/train-*
- config_name: minervamath--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last
data_files:
- split: train
path: minervamath--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last/train-*
- config_name: minervamath--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last--evals
data_files:
- split: train
path: minervamath--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last--evals/train-*
- config_name: minervamath--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last
data_files:
- split: train
path: minervamath--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last/train-*
- config_name: minervamath--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last--evals
data_files:
- split: train
path: minervamath--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last--evals/train-*
- config_name: minervamath--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last
data_files:
- split: train
path: minervamath--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last/train-*
- config_name: minervamath--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last--evals
data_files:
- split: train
path: minervamath--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last--evals/train-*
- config_name: rebuttal-MATH500--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last
data_files:
- split: train
path: rebuttal-MATH500--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last/train-*
- config_name: rebuttal-MATH500--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last--evals
data_files:
- split: train
path: rebuttal-MATH500--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last--evals/train-*
- config_name: rebuttal-MATH500--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last
data_files:
- split: train
path: rebuttal-MATH500--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last/train-*
- config_name: rebuttal-MATH500--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last--evals
data_files:
- split: train
path: rebuttal-MATH500--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last--evals/train-*
- config_name: rebuttal-MATH500--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last
data_files:
- split: train
path: rebuttal-MATH500--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last/train-*
- config_name: rebuttal-MATH500--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last--evals
data_files:
- split: train
path: rebuttal-MATH500--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last--evals/train-*
- config_name: rebuttal-OlympiadBench--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last
data_files:
- split: train
path: rebuttal-OlympiadBench--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last/train-*
- config_name: rebuttal-OlympiadBench--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last--evals
data_files:
- split: train
path: rebuttal-OlympiadBench--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last--evals/train-*
- config_name: rebuttal-OlympiadBench--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last
data_files:
- split: train
path: rebuttal-OlympiadBench--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last/train-*
- config_name: rebuttal-OlympiadBench--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last--evals
data_files:
- split: train
path: rebuttal-OlympiadBench--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last--evals/train-*
- config_name: rebuttal-OlympiadBench--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last
data_files:
- split: train
path: rebuttal-OlympiadBench--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last/train-*
- config_name: rebuttal-OlympiadBench--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last--evals
data_files:
- split: train
path: rebuttal-OlympiadBench--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last--evals/train-*
- config_name: rebuttal-minervamath--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last
data_files:
- split: train
path: rebuttal-minervamath--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last/train-*
- config_name: rebuttal-minervamath--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last--evals
data_files:
- split: train
path: rebuttal-minervamath--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last--evals/train-*
- config_name: rebuttal-minervamath--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last
data_files:
- split: train
path: rebuttal-minervamath--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last/train-*
- config_name: rebuttal-minervamath--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last--evals
data_files:
- split: train
path: rebuttal-minervamath--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last--evals/train-*
- config_name: rebuttal-minervamath--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last
data_files:
- split: train
path: rebuttal-minervamath--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last/train-*
- config_name: rebuttal-minervamath--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last--evals
data_files:
- split: train
path: rebuttal-minervamath--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last--evals/train-*
---
提供机构:
sibasmarakp
搜集汇总
数据集介绍

构建方式
在数学推理领域,数据集的构建往往依赖于高质量的问题与解答对。该数据集通过整合多个数学基准,包括MATH500、OlympiadBench和minervamath,采用大语言模型生成多样化的解答路径。具体而言,针对每个数学问题,模型在温度参数0.7和top-p采样0.8的设置下,生成八种不同的解答方案,并基于PRM(过程监督奖励模型)进行评分。这种构建方式不仅确保了解答的多样性,还通过评分机制为每个解答赋予了质量权重,为后续的聚合策略提供了可靠的数据基础。
使用方法
该数据集主要用于评估和比较不同答案聚合策略在数学推理任务上的有效性。研究者可以加载特定的配置,例如基于不同随机种子的MATH500子集,分析其包含的多种预测结果与标准答案的一致性。通过对比acc_naive、acc_weighted和acc_maj等评估指标,能够量化不同聚合方法的准确率差异。此外,数据集中的解答评分和token数量信息,可用于训练或微调奖励模型,或作为推理时验证的参考,从而推动数学问题求解中鲁棒性预测方法的发展。
背景与挑战
背景概述
在人工智能与数学推理交叉领域,大型语言模型(LLMs)的数学问题求解能力评估成为研究焦点。Qwen2.5-Math-7B-Instruct-math-shepherd-mistral-7b-prm-best_of_n-completions数据集应运而生,旨在系统评估和提升模型在复杂数学任务中的表现。该数据集由研究团队基于Qwen2.5-Math-7B-Instruct等先进模型构建,核心研究问题聚焦于通过多候选答案生成与评分策略(如加权投票、多数投票)优化数学推理的准确性与鲁棒性。其影响力体现在为数学智能评估提供了标准化基准,推动了模型在MATH500、OlympiadBench等数学竞赛级问题上的性能突破。
当前挑战
该数据集致力于解决数学推理领域的关键挑战,即模型在生成式数学问题求解中常出现逻辑不一致或计算错误。具体挑战包括:模型需处理多样化的数学子领域(如代数、几何)与难度层级,确保答案的精确性与解释的连贯性;同时,构建过程中面临多候选答案的质量控制难题,例如评分函数的设计需平衡语义理解与数值准确性,以及大规模数学问题标注的高成本与专家依赖性。此外,数据集还需应对不同聚合策略(如加权@n、多数@n)在评估中的公平性与泛化性验证。
常用场景
经典使用场景
在数学推理与大型语言模型评估领域,该数据集作为基准测试工具,用于系统评估模型在复杂数学问题上的生成与推理能力。数据集包含MATH500、OlympiadBench等多个数学竞赛题目,每个问题均配有多个模型生成的候选答案及其评分,支持对不同聚合策略(如加权平均、多数投票)进行对比分析。研究者通过该数据集能够深入探究模型在代数、几何等数学子领域的表现差异,为优化模型推理路径提供实证依据。
解决学术问题
该数据集有效解决了数学自动推理中模型输出稳定性与准确性的量化评估难题。通过提供多轮生成答案及其置信度分数,数据集支持研究者在不确定性建模、答案聚合机制等方向进行探索,例如比较朴素选择、加权评分与多数投票策略的优劣。这有助于揭示大型语言模型在数学问题求解中的内在偏差,推动可解释性人工智能与可靠推理方法的发展,为数学教育技术提供理论支撑。
实际应用
在实际应用中,该数据集为智能辅导系统与自动化评分工具的开发提供了关键数据资源。教育机构可利用数据集中的问题与答案生成结果,构建自适应学习平台,实时评估学生解题思路并提供个性化反馈。同时,科技公司能够基于数据集的评估框架优化其数学推理模型,提升产品在作业辅助、竞赛培训等场景的实用性与可靠性,促进人工智能与数学教育的深度融合。
数据集最近研究
最新研究方向
在数学推理领域,大型语言模型的能力评估与提升已成为研究焦点。Qwen2.5-Math-7B-Instruct-math-shepherd-mistral-7b-prm-best_of_n-completions数据集通过集成MATH500、OlympiadBench和minervamath等多个数学基准,提供了丰富的多步推理生成与评分数据。该数据集的核心研究方向聚焦于基于过程奖励模型(PRM)的答案聚合策略优化,探索加权投票、多数投票与朴素选择等不同方法在提升模型数学问题解决准确性上的效能。这一方向紧密关联当前人工智能在复杂逻辑推理中的热点挑战,为模型自我改进与鲁棒性评估提供了实证基础,对推动教育智能与自动化解题系统的发展具有深远意义。
以上内容由遇见数据集搜集并总结生成



