five

sibasmarakp/Qwen2.5-Math-7B-Instruct-math-shepherd-mistral-7b-prm-best_of_n-completions

收藏
Hugging Face2026-03-28 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/sibasmarakp/Qwen2.5-Math-7B-Instruct-math-shepherd-mistral-7b-prm-best_of_n-completions
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: MATH500--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last features: - name: problem dtype: string - name: solution dtype: string - name: answer dtype: string - name: subject dtype: string - name: level dtype: int64 - name: unique_id dtype: string - name: completions list: string - name: scores list: list: float64 - name: pred dtype: string - name: completion_tokens list: int64 - name: agg_scores list: float64 - name: pred_weighted@1 dtype: string - name: pred_maj@1 dtype: string - name: pred_naive@1 dtype: string - name: pred_weighted@2 dtype: string - name: pred_maj@2 dtype: string - name: pred_naive@2 dtype: string - name: pred_weighted@4 dtype: string - name: pred_maj@4 dtype: string - name: pred_naive@4 dtype: string - name: pred_weighted@8 dtype: string - name: pred_maj@8 dtype: string - name: pred_naive@8 dtype: string splits: - name: train num_bytes: 8635020 num_examples: 500 download_size: 7944661 dataset_size: 8635020 - config_name: MATH500--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last--evals features: - name: n dtype: int64 - name: acc_naive dtype: float64 - name: acc_weighted dtype: float64 - name: acc_maj dtype: float64 splits: - name: train num_bytes: 96 num_examples: 3 download_size: 2196 dataset_size: 96 - config_name: MATH500--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last features: - name: problem dtype: string - name: solution dtype: string - name: answer dtype: string - name: subject dtype: string - name: level dtype: int64 - name: unique_id dtype: string - name: completions list: string - name: scores list: list: float64 - name: pred dtype: string - name: completion_tokens list: int64 - name: agg_scores list: float64 - name: pred_weighted@1 dtype: string - name: pred_maj@1 dtype: string - name: pred_naive@1 dtype: string - name: pred_weighted@2 dtype: string - name: pred_maj@2 dtype: string - name: pred_naive@2 dtype: string - name: pred_weighted@4 dtype: string - name: pred_maj@4 dtype: string - name: pred_naive@4 dtype: string - name: pred_weighted@8 dtype: string - name: pred_maj@8 dtype: string - name: pred_naive@8 dtype: string splits: - name: train num_bytes: 8789364 num_examples: 500 download_size: 8093954 dataset_size: 8789364 - config_name: MATH500--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last--evals features: - name: n dtype: int64 - name: acc_naive dtype: float64 - name: acc_weighted dtype: float64 - name: acc_maj dtype: float64 splits: - name: train num_bytes: 128 num_examples: 4 download_size: 2209 dataset_size: 128 - config_name: MATH500--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last features: - name: problem dtype: string - name: solution dtype: string - name: answer dtype: string - name: subject dtype: string - name: level dtype: int64 - name: unique_id dtype: string - name: completions list: string - name: scores list: list: float64 - name: pred dtype: string - name: completion_tokens list: int64 - name: agg_scores list: float64 - name: pred_weighted@1 dtype: string - name: pred_maj@1 dtype: string - name: pred_naive@1 dtype: string - name: pred_weighted@2 dtype: string - name: pred_maj@2 dtype: string - name: pred_naive@2 dtype: string - name: pred_weighted@4 dtype: string - name: pred_maj@4 dtype: string - name: pred_naive@4 dtype: string - name: pred_weighted@8 dtype: string - name: pred_maj@8 dtype: string - name: pred_naive@8 dtype: string splits: - name: train num_bytes: 8629571 num_examples: 500 download_size: 7934752 dataset_size: 8629571 - config_name: MATH500--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last--evals features: - name: n dtype: int64 - name: acc_naive dtype: float64 - name: acc_weighted dtype: float64 - name: acc_maj dtype: float64 splits: - name: train num_bytes: 128 num_examples: 4 download_size: 2214 dataset_size: 128 - config_name: OlympiadBench--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last features: - name: id dtype: int64 - name: problem dtype: string - name: solution list: string - name: answer list: string - name: context dtype: 'null' - name: image_1 dtype: 'null' - name: image_2 dtype: 'null' - name: image_3 dtype: 'null' - name: image_4 dtype: 'null' - name: image_5 dtype: 'null' - name: image_6 dtype: 'null' - name: image_7 dtype: 'null' - name: image_8 dtype: 'null' - name: image_9 dtype: 'null' - name: modality dtype: string - name: difficulty dtype: string - name: is_multiple_answer dtype: bool - name: unit dtype: string - name: answer_type dtype: string - name: error dtype: string - name: question_type dtype: string - name: subfield dtype: string - name: subject dtype: string - name: language dtype: string - name: completions list: string - name: scores list: list: float64 - name: pred dtype: string - name: completion_tokens list: int64 - name: agg_scores list: float64 - name: pred_weighted@1 dtype: string - name: pred_maj@1 dtype: string - name: pred_naive@1 dtype: string - name: pred_weighted@2 dtype: string - name: pred_maj@2 dtype: string - name: pred_naive@2 dtype: string - name: pred_weighted@4 dtype: string - name: pred_maj@4 dtype: string - name: pred_naive@4 dtype: string - name: pred_weighted@8 dtype: string - name: pred_maj@8 dtype: string - name: pred_naive@8 dtype: string splits: - name: train num_bytes: 18225949 num_examples: 674 download_size: 17202127 dataset_size: 18225949 - config_name: OlympiadBench--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last--evals features: - name: n dtype: int64 - name: acc_naive dtype: float64 - name: acc_weighted dtype: float64 - name: acc_maj dtype: float64 splits: - name: train num_bytes: 128 num_examples: 4 download_size: 2231 dataset_size: 128 - config_name: OlympiadBench--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last features: - name: id dtype: int64 - name: problem dtype: string - name: solution list: string - name: answer list: string - name: context dtype: 'null' - name: image_1 dtype: 'null' - name: image_2 dtype: 'null' - name: image_3 dtype: 'null' - name: image_4 dtype: 'null' - name: image_5 dtype: 'null' - name: image_6 dtype: 'null' - name: image_7 dtype: 'null' - name: image_8 dtype: 'null' - name: image_9 dtype: 'null' - name: modality dtype: string - name: difficulty dtype: string - name: is_multiple_answer dtype: bool - name: unit dtype: string - name: answer_type dtype: string - name: error dtype: string - name: question_type dtype: string - name: subfield dtype: string - name: subject dtype: string - name: language dtype: string - name: completions list: string - name: scores list: list: float64 - name: pred dtype: string - name: completion_tokens list: int64 - name: agg_scores list: float64 - name: pred_weighted@1 dtype: string - name: pred_maj@1 dtype: string - name: pred_naive@1 dtype: string - name: pred_weighted@2 dtype: string - name: pred_maj@2 dtype: string - name: pred_naive@2 dtype: string - name: pred_weighted@4 dtype: string - name: pred_maj@4 dtype: string - name: pred_naive@4 dtype: string - name: pred_weighted@8 dtype: string - name: pred_maj@8 dtype: string - name: pred_naive@8 dtype: string splits: - name: train num_bytes: 18231149 num_examples: 674 download_size: 17123549 dataset_size: 18231149 - config_name: OlympiadBench--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last--evals features: - name: n dtype: int64 - name: acc_naive dtype: float64 - name: acc_weighted dtype: float64 - name: acc_maj dtype: float64 splits: - name: train num_bytes: 128 num_examples: 4 download_size: 2231 dataset_size: 128 - config_name: OlympiadBench--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last features: - name: id dtype: int64 - name: problem dtype: string - name: solution list: string - name: answer list: string - name: context dtype: 'null' - name: image_1 dtype: 'null' - name: image_2 dtype: 'null' - name: image_3 dtype: 'null' - name: image_4 dtype: 'null' - name: image_5 dtype: 'null' - name: image_6 dtype: 'null' - name: image_7 dtype: 'null' - name: image_8 dtype: 'null' - name: image_9 dtype: 'null' - name: modality dtype: string - name: difficulty dtype: string - name: is_multiple_answer dtype: bool - name: unit dtype: string - name: answer_type dtype: string - name: error dtype: string - name: question_type dtype: string - name: subfield dtype: string - name: subject dtype: string - name: language dtype: string - name: completions list: string - name: scores list: list: float64 - name: pred dtype: string - name: completion_tokens list: int64 - name: agg_scores list: float64 - name: pred_weighted@1 dtype: string - name: pred_maj@1 dtype: string - name: pred_naive@1 dtype: string - name: pred_weighted@2 dtype: string - name: pred_maj@2 dtype: string - name: pred_naive@2 dtype: string - name: pred_weighted@4 dtype: string - name: pred_maj@4 dtype: string - name: pred_naive@4 dtype: string - name: pred_weighted@8 dtype: string - name: pred_maj@8 dtype: string - name: pred_naive@8 dtype: string splits: - name: train num_bytes: 18438815 num_examples: 674 download_size: 17373912 dataset_size: 18438815 - config_name: OlympiadBench--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last--evals features: - name: n dtype: int64 - name: acc_naive dtype: float64 - name: acc_weighted dtype: float64 - name: acc_maj dtype: float64 splits: - name: train num_bytes: 128 num_examples: 4 download_size: 2231 dataset_size: 128 - config_name: minervamath--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last features: - name: problem dtype: string - name: answer dtype: string - name: completions list: string - name: scores list: list: float64 - name: pred dtype: string - name: completion_tokens list: int64 - name: agg_scores list: float64 - name: pred_weighted@1 dtype: string - name: pred_maj@1 dtype: string - name: pred_naive@1 dtype: string - name: pred_weighted@2 dtype: string - name: pred_maj@2 dtype: string - name: pred_naive@2 dtype: string - name: pred_weighted@4 dtype: string - name: pred_maj@4 dtype: string - name: pred_naive@4 dtype: string - name: pred_weighted@8 dtype: string - name: pred_maj@8 dtype: string - name: pred_naive@8 dtype: string splits: - name: train num_bytes: 5577220 num_examples: 272 download_size: 5239802 dataset_size: 5577220 - config_name: minervamath--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last--evals features: - name: n dtype: int64 - name: acc_naive dtype: float64 - name: acc_weighted dtype: float64 - name: acc_maj dtype: float64 splits: - name: train num_bytes: 128 num_examples: 4 download_size: 2219 dataset_size: 128 - config_name: minervamath--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last features: - name: problem dtype: string - name: answer dtype: string - name: completions list: string - name: scores list: list: float64 - name: pred dtype: string - name: completion_tokens list: int64 - name: agg_scores list: float64 - name: pred_weighted@1 dtype: string - name: pred_maj@1 dtype: string - name: pred_naive@1 dtype: string - name: pred_weighted@2 dtype: string - name: pred_maj@2 dtype: string - name: pred_naive@2 dtype: string - name: pred_weighted@4 dtype: string - name: pred_maj@4 dtype: string - name: pred_naive@4 dtype: string - name: pred_weighted@8 dtype: string - name: pred_maj@8 dtype: string - name: pred_naive@8 dtype: string splits: - name: train num_bytes: 5563411 num_examples: 272 download_size: 5209063 dataset_size: 5563411 - config_name: minervamath--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last--evals features: - name: n dtype: int64 - name: acc_naive dtype: float64 - name: acc_weighted dtype: float64 - name: acc_maj dtype: float64 splits: - name: train num_bytes: 128 num_examples: 4 download_size: 2233 dataset_size: 128 - config_name: minervamath--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last features: - name: problem dtype: string - name: answer dtype: string - name: completions list: string - name: scores list: list: float64 - name: pred dtype: string - name: completion_tokens list: int64 - name: agg_scores list: float64 - name: pred_weighted@1 dtype: string - name: pred_maj@1 dtype: string - name: pred_naive@1 dtype: string - name: pred_weighted@2 dtype: string - name: pred_maj@2 dtype: string - name: pred_naive@2 dtype: string - name: pred_weighted@4 dtype: string - name: pred_maj@4 dtype: string - name: pred_naive@4 dtype: string - name: pred_weighted@8 dtype: string - name: pred_maj@8 dtype: string - name: pred_naive@8 dtype: string splits: - name: train num_bytes: 5530198 num_examples: 272 download_size: 5156379 dataset_size: 5530198 - config_name: minervamath--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last--evals features: - name: n dtype: int64 - name: acc_naive dtype: float64 - name: acc_weighted dtype: float64 - name: acc_maj dtype: float64 splits: - name: train num_bytes: 128 num_examples: 4 download_size: 2227 dataset_size: 128 - config_name: rebuttal-MATH500--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last features: - name: problem dtype: string - name: solution dtype: string - name: answer dtype: string - name: subject dtype: string - name: level dtype: int64 - name: unique_id dtype: string - name: completions list: string - name: scores list: list: float64 - name: pred dtype: string - name: completion_tokens list: int64 - name: agg_scores list: float64 - name: pred_weighted@1 dtype: string - name: pred_maj@1 dtype: string - name: pred_naive@1 dtype: string - name: pred_weighted@2 dtype: string - name: pred_maj@2 dtype: string - name: pred_naive@2 dtype: string - name: pred_weighted@4 dtype: string - name: pred_maj@4 dtype: string - name: pred_naive@4 dtype: string - name: pred_weighted@8 dtype: string - name: pred_maj@8 dtype: string - name: pred_naive@8 dtype: string splits: - name: train num_bytes: 8692672 num_examples: 500 download_size: 8001245 dataset_size: 8692672 - config_name: rebuttal-MATH500--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last--evals features: - name: n dtype: int64 - name: acc_naive dtype: float64 - name: acc_weighted dtype: float64 - name: acc_maj dtype: float64 splits: - name: train num_bytes: 128 num_examples: 4 download_size: 2218 dataset_size: 128 - config_name: rebuttal-MATH500--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last features: - name: problem dtype: string - name: solution dtype: string - name: answer dtype: string - name: subject dtype: string - name: level dtype: int64 - name: unique_id dtype: string - name: completions list: string - name: scores list: list: float64 - name: pred dtype: string - name: completion_tokens list: int64 - name: agg_scores list: float64 - name: pred_weighted@1 dtype: string - name: pred_maj@1 dtype: string - name: pred_naive@1 dtype: string - name: pred_weighted@2 dtype: string - name: pred_maj@2 dtype: string - name: pred_naive@2 dtype: string - name: pred_weighted@4 dtype: string - name: pred_maj@4 dtype: string - name: pred_naive@4 dtype: string - name: pred_weighted@8 dtype: string - name: pred_maj@8 dtype: string - name: pred_naive@8 dtype: string splits: - name: train num_bytes: 8737738 num_examples: 500 download_size: 8041960 dataset_size: 8737738 - config_name: rebuttal-MATH500--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last--evals features: - name: n dtype: int64 - name: acc_naive dtype: float64 - name: acc_weighted dtype: float64 - name: acc_maj dtype: float64 splits: - name: train num_bytes: 128 num_examples: 4 download_size: 2210 dataset_size: 128 - config_name: rebuttal-MATH500--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last features: - name: problem dtype: string - name: solution dtype: string - name: answer dtype: string - name: subject dtype: string - name: level dtype: int64 - name: unique_id dtype: string - name: completions list: string - name: scores list: list: float64 - name: pred dtype: string - name: completion_tokens list: int64 - name: agg_scores list: float64 - name: pred_weighted@1 dtype: string - name: pred_maj@1 dtype: string - name: pred_naive@1 dtype: string - name: pred_weighted@2 dtype: string - name: pred_maj@2 dtype: string - name: pred_naive@2 dtype: string - name: pred_weighted@4 dtype: string - name: pred_maj@4 dtype: string - name: pred_naive@4 dtype: string - name: pred_weighted@8 dtype: string - name: pred_maj@8 dtype: string - name: pred_naive@8 dtype: string splits: - name: train num_bytes: 8632821 num_examples: 500 download_size: 7938602 dataset_size: 8632821 - config_name: rebuttal-MATH500--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last--evals features: - name: n dtype: int64 - name: acc_naive dtype: float64 - name: acc_weighted dtype: float64 - name: acc_maj dtype: float64 splits: - name: train num_bytes: 128 num_examples: 4 download_size: 2219 dataset_size: 128 - config_name: rebuttal-OlympiadBench--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last features: - name: id dtype: int64 - name: problem dtype: string - name: solution list: string - name: answer list: string - name: context dtype: 'null' - name: image_1 dtype: 'null' - name: image_2 dtype: 'null' - name: image_3 dtype: 'null' - name: image_4 dtype: 'null' - name: image_5 dtype: 'null' - name: image_6 dtype: 'null' - name: image_7 dtype: 'null' - name: image_8 dtype: 'null' - name: image_9 dtype: 'null' - name: modality dtype: string - name: difficulty dtype: string - name: is_multiple_answer dtype: bool - name: unit dtype: string - name: answer_type dtype: string - name: error dtype: string - name: question_type dtype: string - name: subfield dtype: string - name: subject dtype: string - name: language dtype: string - name: completions list: string - name: scores list: list: float64 - name: pred dtype: string - name: completion_tokens list: int64 - name: agg_scores list: float64 - name: pred_weighted@1 dtype: string - name: pred_maj@1 dtype: string - name: pred_naive@1 dtype: string - name: pred_weighted@2 dtype: string - name: pred_maj@2 dtype: string - name: pred_naive@2 dtype: string - name: pred_weighted@4 dtype: string - name: pred_maj@4 dtype: string - name: pred_naive@4 dtype: string - name: pred_weighted@8 dtype: string - name: pred_maj@8 dtype: string - name: pred_naive@8 dtype: string splits: - name: train num_bytes: 18297133 num_examples: 674 download_size: 34354833 dataset_size: 18297133 - config_name: rebuttal-OlympiadBench--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last--evals features: - name: n dtype: int64 - name: acc_naive dtype: float64 - name: acc_weighted dtype: float64 - name: acc_maj dtype: float64 splits: - name: train num_bytes: 128 num_examples: 4 download_size: 2231 dataset_size: 128 - config_name: rebuttal-OlympiadBench--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last features: - name: id dtype: int64 - name: problem dtype: string - name: solution list: string - name: answer list: string - name: context dtype: 'null' - name: image_1 dtype: 'null' - name: image_2 dtype: 'null' - name: image_3 dtype: 'null' - name: image_4 dtype: 'null' - name: image_5 dtype: 'null' - name: image_6 dtype: 'null' - name: image_7 dtype: 'null' - name: image_8 dtype: 'null' - name: image_9 dtype: 'null' - name: modality dtype: string - name: difficulty dtype: string - name: is_multiple_answer dtype: bool - name: unit dtype: string - name: answer_type dtype: string - name: error dtype: string - name: question_type dtype: string - name: subfield dtype: string - name: subject dtype: string - name: language dtype: string - name: completions list: string - name: scores list: list: float64 - name: pred dtype: string - name: completion_tokens list: int64 - name: agg_scores list: float64 - name: pred_weighted@1 dtype: string - name: pred_maj@1 dtype: string - name: pred_naive@1 dtype: string - name: pred_weighted@2 dtype: string - name: pred_maj@2 dtype: string - name: pred_naive@2 dtype: string - name: pred_weighted@4 dtype: string - name: pred_maj@4 dtype: string - name: pred_naive@4 dtype: string - name: pred_weighted@8 dtype: string - name: pred_maj@8 dtype: string - name: pred_naive@8 dtype: string splits: - name: train num_bytes: 18223927 num_examples: 674 download_size: 34218810 dataset_size: 18223927 - config_name: rebuttal-OlympiadBench--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last--evals features: - name: n dtype: int64 - name: acc_naive dtype: float64 - name: acc_weighted dtype: float64 - name: acc_maj dtype: float64 splits: - name: train num_bytes: 128 num_examples: 4 download_size: 2223 dataset_size: 128 - config_name: rebuttal-OlympiadBench--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last features: - name: id dtype: int64 - name: problem dtype: string - name: solution list: string - name: answer list: string - name: context dtype: 'null' - name: image_1 dtype: 'null' - name: image_2 dtype: 'null' - name: image_3 dtype: 'null' - name: image_4 dtype: 'null' - name: image_5 dtype: 'null' - name: image_6 dtype: 'null' - name: image_7 dtype: 'null' - name: image_8 dtype: 'null' - name: image_9 dtype: 'null' - name: modality dtype: string - name: difficulty dtype: string - name: is_multiple_answer dtype: bool - name: unit dtype: string - name: answer_type dtype: string - name: error dtype: string - name: question_type dtype: string - name: subfield dtype: string - name: subject dtype: string - name: language dtype: string - name: completions list: string - name: scores list: list: float64 - name: pred dtype: string - name: completion_tokens list: int64 - name: agg_scores list: float64 - name: pred_weighted@1 dtype: string - name: pred_maj@1 dtype: string - name: pred_naive@1 dtype: string - name: pred_weighted@2 dtype: string - name: pred_maj@2 dtype: string - name: pred_naive@2 dtype: string - name: pred_weighted@4 dtype: string - name: pred_maj@4 dtype: string - name: pred_naive@4 dtype: string - name: pred_weighted@8 dtype: string - name: pred_maj@8 dtype: string - name: pred_naive@8 dtype: string splits: - name: train num_bytes: 18419395 num_examples: 674 download_size: 34706247 dataset_size: 18419395 - config_name: rebuttal-OlympiadBench--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last--evals features: - name: n dtype: int64 - name: acc_naive dtype: float64 - name: acc_weighted dtype: float64 - name: acc_maj dtype: float64 splits: - name: train num_bytes: 128 num_examples: 4 download_size: 2231 dataset_size: 128 - config_name: rebuttal-minervamath--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last features: - name: problem dtype: string - name: answer dtype: string - name: completions list: string - name: scores list: list: float64 - name: pred dtype: string - name: completion_tokens list: int64 - name: agg_scores list: float64 - name: pred_weighted@1 dtype: string - name: pred_maj@1 dtype: string - name: pred_naive@1 dtype: string - name: pred_weighted@2 dtype: string - name: pred_maj@2 dtype: string - name: pred_naive@2 dtype: string - name: pred_weighted@4 dtype: string - name: pred_maj@4 dtype: string - name: pred_naive@4 dtype: string - name: pred_weighted@8 dtype: string - name: pred_maj@8 dtype: string - name: pred_naive@8 dtype: string splits: - name: train num_bytes: 5577220 num_examples: 272 download_size: 5239802 dataset_size: 5577220 - config_name: rebuttal-minervamath--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last--evals features: - name: n dtype: int64 - name: acc_naive dtype: float64 - name: acc_weighted dtype: float64 - name: acc_maj dtype: float64 splits: - name: train num_bytes: 128 num_examples: 4 download_size: 2224 dataset_size: 128 - config_name: rebuttal-minervamath--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last features: - name: problem dtype: string - name: answer dtype: string - name: completions list: string - name: scores list: list: float64 - name: pred dtype: string - name: completion_tokens list: int64 - name: agg_scores list: float64 - name: pred_weighted@1 dtype: string - name: pred_maj@1 dtype: string - name: pred_naive@1 dtype: string - name: pred_weighted@2 dtype: string - name: pred_maj@2 dtype: string - name: pred_naive@2 dtype: string - name: pred_weighted@4 dtype: string - name: pred_maj@4 dtype: string - name: pred_naive@4 dtype: string - name: pred_weighted@8 dtype: string - name: pred_maj@8 dtype: string - name: pred_naive@8 dtype: string splits: - name: train num_bytes: 5563411 num_examples: 272 download_size: 5209063 dataset_size: 5563411 - config_name: rebuttal-minervamath--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last--evals features: - name: n dtype: int64 - name: acc_naive dtype: float64 - name: acc_weighted dtype: float64 - name: acc_maj dtype: float64 splits: - name: train num_bytes: 128 num_examples: 4 download_size: 2217 dataset_size: 128 - config_name: rebuttal-minervamath--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last features: - name: problem dtype: string - name: answer dtype: string - name: completions list: string - name: scores list: list: float64 - name: pred dtype: string - name: completion_tokens list: int64 - name: agg_scores list: float64 - name: pred_weighted@1 dtype: string - name: pred_maj@1 dtype: string - name: pred_naive@1 dtype: string - name: pred_weighted@2 dtype: string - name: pred_maj@2 dtype: string - name: pred_naive@2 dtype: string - name: pred_weighted@4 dtype: string - name: pred_maj@4 dtype: string - name: pred_naive@4 dtype: string - name: pred_weighted@8 dtype: string - name: pred_maj@8 dtype: string - name: pred_naive@8 dtype: string splits: - name: train num_bytes: 5530198 num_examples: 272 download_size: 5156379 dataset_size: 5530198 - config_name: rebuttal-minervamath--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last--evals features: - name: n dtype: int64 - name: acc_naive dtype: float64 - name: acc_weighted dtype: float64 - name: acc_maj dtype: float64 splits: - name: train num_bytes: 128 num_examples: 4 download_size: 2235 dataset_size: 128 configs: - config_name: MATH500--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last data_files: - split: train path: MATH500--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last/train-* - config_name: MATH500--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last--evals data_files: - split: train path: MATH500--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last--evals/train-* - config_name: MATH500--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last data_files: - split: train path: MATH500--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last/train-* - config_name: MATH500--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last--evals data_files: - split: train path: MATH500--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last--evals/train-* - config_name: MATH500--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last data_files: - split: train path: MATH500--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last/train-* - config_name: MATH500--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last--evals data_files: - split: train path: MATH500--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last--evals/train-* - config_name: OlympiadBench--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last data_files: - split: train path: OlympiadBench--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last/train-* - config_name: OlympiadBench--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last--evals data_files: - split: train path: OlympiadBench--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last--evals/train-* - config_name: OlympiadBench--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last data_files: - split: train path: OlympiadBench--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last/train-* - config_name: OlympiadBench--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last--evals data_files: - split: train path: OlympiadBench--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last--evals/train-* - config_name: OlympiadBench--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last data_files: - split: train path: OlympiadBench--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last/train-* - config_name: OlympiadBench--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last--evals data_files: - split: train path: OlympiadBench--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last--evals/train-* - config_name: minervamath--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last data_files: - split: train path: minervamath--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last/train-* - config_name: minervamath--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last--evals data_files: - split: train path: minervamath--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last--evals/train-* - config_name: minervamath--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last data_files: - split: train path: minervamath--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last/train-* - config_name: minervamath--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last--evals data_files: - split: train path: minervamath--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last--evals/train-* - config_name: minervamath--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last data_files: - split: train path: minervamath--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last/train-* - config_name: minervamath--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last--evals data_files: - split: train path: minervamath--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last--evals/train-* - config_name: rebuttal-MATH500--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last data_files: - split: train path: rebuttal-MATH500--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last/train-* - config_name: rebuttal-MATH500--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last--evals data_files: - split: train path: rebuttal-MATH500--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last--evals/train-* - config_name: rebuttal-MATH500--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last data_files: - split: train path: rebuttal-MATH500--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last/train-* - config_name: rebuttal-MATH500--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last--evals data_files: - split: train path: rebuttal-MATH500--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last--evals/train-* - config_name: rebuttal-MATH500--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last data_files: - split: train path: rebuttal-MATH500--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last/train-* - config_name: rebuttal-MATH500--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last--evals data_files: - split: train path: rebuttal-MATH500--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last--evals/train-* - config_name: rebuttal-OlympiadBench--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last data_files: - split: train path: rebuttal-OlympiadBench--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last/train-* - config_name: rebuttal-OlympiadBench--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last--evals data_files: - split: train path: rebuttal-OlympiadBench--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last--evals/train-* - config_name: rebuttal-OlympiadBench--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last data_files: - split: train path: rebuttal-OlympiadBench--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last/train-* - config_name: rebuttal-OlympiadBench--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last--evals data_files: - split: train path: rebuttal-OlympiadBench--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last--evals/train-* - config_name: rebuttal-OlympiadBench--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last data_files: - split: train path: rebuttal-OlympiadBench--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last/train-* - config_name: rebuttal-OlympiadBench--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last--evals data_files: - split: train path: rebuttal-OlympiadBench--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last--evals/train-* - config_name: rebuttal-minervamath--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last data_files: - split: train path: rebuttal-minervamath--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last/train-* - config_name: rebuttal-minervamath--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last--evals data_files: - split: train path: rebuttal-minervamath--T-0.7--top_p-0.8--n-8--seed-0--agg_strategy-last--evals/train-* - config_name: rebuttal-minervamath--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last data_files: - split: train path: rebuttal-minervamath--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last/train-* - config_name: rebuttal-minervamath--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last--evals data_files: - split: train path: rebuttal-minervamath--T-0.7--top_p-0.8--n-8--seed-1--agg_strategy-last--evals/train-* - config_name: rebuttal-minervamath--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last data_files: - split: train path: rebuttal-minervamath--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last/train-* - config_name: rebuttal-minervamath--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last--evals data_files: - split: train path: rebuttal-minervamath--T-0.7--top_p-0.8--n-8--seed-2--agg_strategy-last--evals/train-* ---
提供机构:
sibasmarakp
搜集汇总
数据集介绍
main_image_url
构建方式
在数学推理领域,数据集的构建往往依赖于高质量的问题与解答对。该数据集通过整合多个数学基准,包括MATH500、OlympiadBench和minervamath,采用大语言模型生成多样化的解答路径。具体而言,针对每个数学问题,模型在温度参数0.7和top-p采样0.8的设置下,生成八种不同的解答方案,并基于PRM(过程监督奖励模型)进行评分。这种构建方式不仅确保了解答的多样性,还通过评分机制为每个解答赋予了质量权重,为后续的聚合策略提供了可靠的数据基础。
使用方法
该数据集主要用于评估和比较不同答案聚合策略在数学推理任务上的有效性。研究者可以加载特定的配置,例如基于不同随机种子的MATH500子集,分析其包含的多种预测结果与标准答案的一致性。通过对比acc_naive、acc_weighted和acc_maj等评估指标,能够量化不同聚合方法的准确率差异。此外,数据集中的解答评分和token数量信息,可用于训练或微调奖励模型,或作为推理时验证的参考,从而推动数学问题求解中鲁棒性预测方法的发展。
背景与挑战
背景概述
在人工智能与数学推理交叉领域,大型语言模型(LLMs)的数学问题求解能力评估成为研究焦点。Qwen2.5-Math-7B-Instruct-math-shepherd-mistral-7b-prm-best_of_n-completions数据集应运而生,旨在系统评估和提升模型在复杂数学任务中的表现。该数据集由研究团队基于Qwen2.5-Math-7B-Instruct等先进模型构建,核心研究问题聚焦于通过多候选答案生成与评分策略(如加权投票、多数投票)优化数学推理的准确性与鲁棒性。其影响力体现在为数学智能评估提供了标准化基准,推动了模型在MATH500、OlympiadBench等数学竞赛级问题上的性能突破。
当前挑战
该数据集致力于解决数学推理领域的关键挑战,即模型在生成式数学问题求解中常出现逻辑不一致或计算错误。具体挑战包括:模型需处理多样化的数学子领域(如代数、几何)与难度层级,确保答案的精确性与解释的连贯性;同时,构建过程中面临多候选答案的质量控制难题,例如评分函数的设计需平衡语义理解与数值准确性,以及大规模数学问题标注的高成本与专家依赖性。此外,数据集还需应对不同聚合策略(如加权@n、多数@n)在评估中的公平性与泛化性验证。
常用场景
经典使用场景
在数学推理与大型语言模型评估领域,该数据集作为基准测试工具,用于系统评估模型在复杂数学问题上的生成与推理能力。数据集包含MATH500、OlympiadBench等多个数学竞赛题目,每个问题均配有多个模型生成的候选答案及其评分,支持对不同聚合策略(如加权平均、多数投票)进行对比分析。研究者通过该数据集能够深入探究模型在代数、几何等数学子领域的表现差异,为优化模型推理路径提供实证依据。
解决学术问题
该数据集有效解决了数学自动推理中模型输出稳定性与准确性的量化评估难题。通过提供多轮生成答案及其置信度分数,数据集支持研究者在不确定性建模、答案聚合机制等方向进行探索,例如比较朴素选择、加权评分与多数投票策略的优劣。这有助于揭示大型语言模型在数学问题求解中的内在偏差,推动可解释性人工智能与可靠推理方法的发展,为数学教育技术提供理论支撑。
实际应用
在实际应用中,该数据集为智能辅导系统与自动化评分工具的开发提供了关键数据资源。教育机构可利用数据集中的问题与答案生成结果,构建自适应学习平台,实时评估学生解题思路并提供个性化反馈。同时,科技公司能够基于数据集的评估框架优化其数学推理模型,提升产品在作业辅助、竞赛培训等场景的实用性与可靠性,促进人工智能与数学教育的深度融合。
数据集最近研究
最新研究方向
在数学推理领域,大型语言模型的能力评估与提升已成为研究焦点。Qwen2.5-Math-7B-Instruct-math-shepherd-mistral-7b-prm-best_of_n-completions数据集通过集成MATH500、OlympiadBench和minervamath等多个数学基准,提供了丰富的多步推理生成与评分数据。该数据集的核心研究方向聚焦于基于过程奖励模型(PRM)的答案聚合策略优化,探索加权投票、多数投票与朴素选择等不同方法在提升模型数学问题解决准确性上的效能。这一方向紧密关联当前人工智能在复杂逻辑推理中的热点挑战,为模型自我改进与鲁棒性评估提供了实证基础,对推动教育智能与自动化解题系统的发展具有深远意义。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作