nandansarkar/base_model_on_log_odds_ranked_samples_without_suffix_eval_a47d
收藏Hugging Face2025-12-15 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/nandansarkar/base_model_on_log_odds_ranked_samples_without_suffix_eval_a47d
下载链接
链接失效反馈官方服务:
资源简介:
# nandansarkar/base_model_on_log_odds_ranked_samples_without_suffix_eval_a47d
Precomputed model outputs for evaluation.
## Evaluation Results
### Summary
| Metric | AIME24 | AIME25 |
|--------|------|------|
| Accuracy | 14.0 | 11.3 |
### AIME24
- **Average Accuracy**: 14.00% ± 1.48%
- **Number of Runs**: 10
| Run | Accuracy | Questions Solved | Total Questions |
|-----|----------|-----------------|----------------|
| 1 | 13.33% | 4 | 30 |
| 2 | 13.33% | 4 | 30 |
| 3 | 16.67% | 5 | 30 |
| 4 | 6.67% | 2 | 30 |
| 5 | 16.67% | 5 | 30 |
| 6 | 23.33% | 7 | 30 |
| 7 | 16.67% | 5 | 30 |
| 8 | 13.33% | 4 | 30 |
| 9 | 13.33% | 4 | 30 |
| 10 | 6.67% | 2 | 30 |
### AIME25
- **Average Accuracy**: 11.33% ± 1.07%
- **Number of Runs**: 10
| Run | Accuracy | Questions Solved | Total Questions |
|-----|----------|-----------------|----------------|
| 1 | 6.67% | 2 | 30 |
| 2 | 13.33% | 4 | 30 |
| 3 | 13.33% | 4 | 30 |
| 4 | 13.33% | 4 | 30 |
| 5 | 3.33% | 1 | 30 |
| 6 | 13.33% | 4 | 30 |
| 7 | 13.33% | 4 | 30 |
| 8 | 13.33% | 4 | 30 |
| 9 | 13.33% | 4 | 30 |
| 10 | 10.00% | 3 | 30 |
提供机构:
nandansarkar



