nandansarkar/base_model_on_log_odds_ranked_samples_without_suffix_eval_a47d

Name: nandansarkar/base_model_on_log_odds_ranked_samples_without_suffix_eval_a47d
Creator: nandansarkar
Published: 2025-12-15 13:12:14
License: 暂无描述

Hugging Face2025-12-15 更新2025-12-20 收录

下载链接：

https://hf-mirror.com/datasets/nandansarkar/base_model_on_log_odds_ranked_samples_without_suffix_eval_a47d

下载链接

链接失效反馈

官方服务：

资源简介：

# nandansarkar/base_model_on_log_odds_ranked_samples_without_suffix_eval_a47d Precomputed model outputs for evaluation. ## Evaluation Results ### Summary | Metric | AIME24 | AIME25 | |--------|------|------| | Accuracy | 14.0 | 11.3 | ### AIME24 - **Average Accuracy**: 14.00% ± 1.48% - **Number of Runs**: 10 | Run | Accuracy | Questions Solved | Total Questions | |-----|----------|-----------------|----------------| | 1 | 13.33% | 4 | 30 | | 2 | 13.33% | 4 | 30 | | 3 | 16.67% | 5 | 30 | | 4 | 6.67% | 2 | 30 | | 5 | 16.67% | 5 | 30 | | 6 | 23.33% | 7 | 30 | | 7 | 16.67% | 5 | 30 | | 8 | 13.33% | 4 | 30 | | 9 | 13.33% | 4 | 30 | | 10 | 6.67% | 2 | 30 | ### AIME25 - **Average Accuracy**: 11.33% ± 1.07% - **Number of Runs**: 10 | Run | Accuracy | Questions Solved | Total Questions | |-----|----------|-----------------|----------------| | 1 | 6.67% | 2 | 30 | | 2 | 13.33% | 4 | 30 | | 3 | 13.33% | 4 | 30 | | 4 | 13.33% | 4 | 30 | | 5 | 3.33% | 1 | 30 | | 6 | 13.33% | 4 | 30 | | 7 | 13.33% | 4 | 30 | | 8 | 13.33% | 4 | 30 | | 9 | 13.33% | 4 | 30 | | 10 | 10.00% | 3 | 30 |

提供机构：

nandansarkar

5,000+

优质数据集

54 个

任务类型

进入经典数据集