DeepSeek-R1-Distill-Qwen-7B_eval_03-07-25_17-46_2870
收藏魔搭社区2025-10-03 更新2025-10-04 收录
下载链接:
https://modelscope.cn/datasets/mlfoundations-dev/DeepSeek-R1-Distill-Qwen-7B_eval_03-07-25_17-46_2870
下载链接
链接失效反馈官方服务:
资源简介:
# mlfoundations-dev/DeepSeek-R1-Distill-Qwen-7B_eval_03-07-25_17-46_2870
Precomputed model outputs for evaluation.
## Evaluation Results
### AIME24
- **Average Accuracy**: 37.33% ± 3.18%
- **Number of Runs**: 5
| Run | Accuracy | Questions Solved | Total Questions |
|-----|----------|-----------------|----------------|
| 1 | 36.67% | 11 | 30 |
| 2 | 33.33% | 10 | 30 |
| 3 | 43.33% | 13 | 30 |
| 4 | 46.67% | 14 | 30 |
| 5 | 26.67% | 8 | 30 |
# mlfoundations-dev/DeepSeek-R1-Distill-Qwen-7B_eval_03-07-25_17-46_2870
本数据集包含用于模型评估的预计算模型输出结果。
## 评估结果
### 2024年美国数学邀请赛(American Invitational Mathematics Examination,AIME24)
- **平均准确率**:37.33% ± 3.18%
- **运行批次数量**:5
| 运行批次 | 准确率 | 已解决题目数 | 总题目数 |
|-----|----------|-----------------|----------------|
| 1 | 36.67% | 11 | 30 |
| 2 | 33.33% | 10 | 30 |
| 3 | 43.33% | 13 | 30 |
| 4 | 46.67% | 14 | 30 |
| 5 | 26.67% | 8 | 30 |
提供机构:
maas
创建时间:
2025-10-03



