five

nandansarkar/base_model_on_log_odds_ranked_samples_without_suffix_eval_c693

收藏
Hugging Face2025-12-15 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/nandansarkar/base_model_on_log_odds_ranked_samples_without_suffix_eval_c693
下载链接
链接失效反馈
官方服务:
资源简介:
# nandansarkar/base_model_on_log_odds_ranked_samples_without_suffix_eval_c693 Precomputed model outputs for evaluation. ## Evaluation Results ### Summary | Metric | AIME24 | AIME25 | GPQADiamond | JEEBench | |--------|------|------|-----------|--------| | Accuracy | 15.7 | 9.7 | 22.1 | 27.1 | ### AIME24 - **Average Accuracy**: 15.67% ± 0.67% - **Number of Runs**: 10 | Run | Accuracy | Questions Solved | Total Questions | |-----|----------|-----------------|----------------| | 1 | 16.67% | 5 | 30 | | 2 | 16.67% | 5 | 30 | | 3 | 16.67% | 5 | 30 | | 4 | 13.33% | 4 | 30 | | 5 | 13.33% | 4 | 30 | | 6 | 13.33% | 4 | 30 | | 7 | 16.67% | 5 | 30 | | 8 | 16.67% | 5 | 30 | | 9 | 20.00% | 6 | 30 | | 10 | 13.33% | 4 | 30 | ### AIME25 - **Average Accuracy**: 9.67% ± 1.52% - **Number of Runs**: 10 | Run | Accuracy | Questions Solved | Total Questions | |-----|----------|-----------------|----------------| | 1 | 10.00% | 3 | 30 | | 2 | 20.00% | 6 | 30 | | 3 | 6.67% | 2 | 30 | | 4 | 16.67% | 5 | 30 | | 5 | 6.67% | 2 | 30 | | 6 | 10.00% | 3 | 30 | | 7 | 6.67% | 2 | 30 | | 8 | 6.67% | 2 | 30 | | 9 | 10.00% | 3 | 30 | | 10 | 3.33% | 1 | 30 | ### GPQADiamond - **Average Accuracy**: 22.12% ± 0.90% - **Number of Runs**: 5 | Run | Accuracy | Questions Solved | Total Questions | |-----|----------|-----------------|----------------| | 1 | 19.70% | 39 | 198 | | 2 | 23.74% | 47 | 198 | | 3 | 23.23% | 46 | 198 | | 4 | 19.70% | 39 | 198 | | 5 | 24.24% | 48 | 198 | ### JEEBench - **Average Accuracy**: 27.08% ± 0.66% - **Number of Runs**: 5 | Run | Accuracy | Questions Solved | Total Questions | |-----|----------|-----------------|----------------| | 1 | 26.46% | 136.25 | 515 | | 2 | 27.33% | 140.75 | 515 | | 3 | 26.80% | 138.0 | 515 | | 4 | 25.15% | 129.5 | 515 | | 5 | 29.66% | 152.75 | 515 |
提供机构:
nandansarkar
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作