nandansarkar/base_model_on_log_odds_ranked_samples_without_suffix_eval_c693
收藏Hugging Face2025-12-15 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/nandansarkar/base_model_on_log_odds_ranked_samples_without_suffix_eval_c693
下载链接
链接失效反馈官方服务:
资源简介:
# nandansarkar/base_model_on_log_odds_ranked_samples_without_suffix_eval_c693
Precomputed model outputs for evaluation.
## Evaluation Results
### Summary
| Metric | AIME24 | AIME25 | GPQADiamond | JEEBench |
|--------|------|------|-----------|--------|
| Accuracy | 15.7 | 9.7 | 22.1 | 27.1 |
### AIME24
- **Average Accuracy**: 15.67% ± 0.67%
- **Number of Runs**: 10
| Run | Accuracy | Questions Solved | Total Questions |
|-----|----------|-----------------|----------------|
| 1 | 16.67% | 5 | 30 |
| 2 | 16.67% | 5 | 30 |
| 3 | 16.67% | 5 | 30 |
| 4 | 13.33% | 4 | 30 |
| 5 | 13.33% | 4 | 30 |
| 6 | 13.33% | 4 | 30 |
| 7 | 16.67% | 5 | 30 |
| 8 | 16.67% | 5 | 30 |
| 9 | 20.00% | 6 | 30 |
| 10 | 13.33% | 4 | 30 |
### AIME25
- **Average Accuracy**: 9.67% ± 1.52%
- **Number of Runs**: 10
| Run | Accuracy | Questions Solved | Total Questions |
|-----|----------|-----------------|----------------|
| 1 | 10.00% | 3 | 30 |
| 2 | 20.00% | 6 | 30 |
| 3 | 6.67% | 2 | 30 |
| 4 | 16.67% | 5 | 30 |
| 5 | 6.67% | 2 | 30 |
| 6 | 10.00% | 3 | 30 |
| 7 | 6.67% | 2 | 30 |
| 8 | 6.67% | 2 | 30 |
| 9 | 10.00% | 3 | 30 |
| 10 | 3.33% | 1 | 30 |
### GPQADiamond
- **Average Accuracy**: 22.12% ± 0.90%
- **Number of Runs**: 5
| Run | Accuracy | Questions Solved | Total Questions |
|-----|----------|-----------------|----------------|
| 1 | 19.70% | 39 | 198 |
| 2 | 23.74% | 47 | 198 |
| 3 | 23.23% | 46 | 198 |
| 4 | 19.70% | 39 | 198 |
| 5 | 24.24% | 48 | 198 |
### JEEBench
- **Average Accuracy**: 27.08% ± 0.66%
- **Number of Runs**: 5
| Run | Accuracy | Questions Solved | Total Questions |
|-----|----------|-----------------|----------------|
| 1 | 26.46% | 136.25 | 515 |
| 2 | 27.33% | 140.75 | 515 |
| 3 | 26.80% | 138.0 | 515 |
| 4 | 25.15% | 129.5 | 515 |
| 5 | 29.66% | 152.75 | 515 |
提供机构:
nandansarkar



