five

nandansarkar/base_model_on_log_odds_ranked_samples_with_suffix_eval_a47d

收藏
Hugging Face2025-12-15 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/nandansarkar/base_model_on_log_odds_ranked_samples_with_suffix_eval_a47d
下载链接
链接失效反馈
官方服务:
资源简介:
# nandansarkar/base_model_on_log_odds_ranked_samples_with_suffix_eval_a47d Precomputed model outputs for evaluation. ## Evaluation Results ### Summary | Metric | AIME24 | AIME25 | |--------|------|------| | Accuracy | 14.0 | 9.7 | ### AIME24 - **Average Accuracy**: 14.00% ± 0.92% - **Number of Runs**: 10 | Run | Accuracy | Questions Solved | Total Questions | |-----|----------|-----------------|----------------| | 1 | 16.67% | 5 | 30 | | 2 | 16.67% | 5 | 30 | | 3 | 6.67% | 2 | 30 | | 4 | 16.67% | 5 | 30 | | 5 | 13.33% | 4 | 30 | | 6 | 13.33% | 4 | 30 | | 7 | 16.67% | 5 | 30 | | 8 | 13.33% | 4 | 30 | | 9 | 13.33% | 4 | 30 | | 10 | 13.33% | 4 | 30 | ### AIME25 - **Average Accuracy**: 9.67% ± 0.99% - **Number of Runs**: 10 | Run | Accuracy | Questions Solved | Total Questions | |-----|----------|-----------------|----------------| | 1 | 10.00% | 3 | 30 | | 2 | 13.33% | 4 | 30 | | 3 | 3.33% | 1 | 30 | | 4 | 10.00% | 3 | 30 | | 5 | 13.33% | 4 | 30 | | 6 | 13.33% | 4 | 30 | | 7 | 10.00% | 3 | 30 | | 8 | 6.67% | 2 | 30 | | 9 | 6.67% | 2 | 30 | | 10 | 10.00% | 3 | 30 |
提供机构:
nandansarkar
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作