DeepSeek-R1-Distill-Qwen-7B_eval_03-07-25_19-30_0981
收藏魔搭社区2025-12-03 更新2025-11-22 收录
下载链接:
https://modelscope.cn/datasets/mlfoundations-dev/DeepSeek-R1-Distill-Qwen-7B_eval_03-07-25_19-30_0981
下载链接
链接失效反馈官方服务:
资源简介:
# mlfoundations-dev/DeepSeek-R1-Distill-Qwen-7B_eval_03-07-25_19-30_0981
Precomputed model outputs for evaluation.
## Evaluation Results
### Summary
| Metric | AIME24 | AIME25 | AMC23 | GPQADiamond | MATH500 |
|--------|------|------|-----|-----------|-------|
| Accuracy | 52.0 | 38.0 | 89.5 | 32.7 | 89.0 |
### AIME24
- **Average Accuracy**: 52.00% ± 4.77%
- **Number of Runs**: 5
| Run | Accuracy | Questions Solved | Total Questions |
|-----|----------|-----------------|----------------|
| 1 | 50.00% | 15 | 30 |
| 2 | 50.00% | 15 | 30 |
| 3 | 53.33% | 16 | 30 |
| 4 | 70.00% | 21 | 30 |
| 5 | 36.67% | 11 | 30 |
### AIME25
- **Average Accuracy**: 38.00% ± 2.23%
- **Number of Runs**: 5
| Run | Accuracy | Questions Solved | Total Questions |
|-----|----------|-----------------|----------------|
| 1 | 36.67% | 11 | 30 |
| 2 | 43.33% | 13 | 30 |
| 3 | 36.67% | 11 | 30 |
| 4 | 30.00% | 9 | 30 |
| 5 | 43.33% | 13 | 30 |
### AMC23
- **Average Accuracy**: 89.50% ± 0.84%
- **Number of Runs**: 5
| Run | Accuracy | Questions Solved | Total Questions |
|-----|----------|-----------------|----------------|
| 1 | 87.50% | 35 | 40 |
| 2 | 92.50% | 37 | 40 |
| 3 | 90.00% | 36 | 40 |
| 4 | 90.00% | 36 | 40 |
| 5 | 87.50% | 35 | 40 |
### GPQADiamond
- **Average Accuracy**: 32.66% ± 6.48%
- **Number of Runs**: 3
| Run | Accuracy | Questions Solved | Total Questions |
|-----|----------|-----------------|----------------|
| 1 | 23.74% | 47 | 198 |
| 2 | 25.76% | 51 | 198 |
| 3 | 48.48% | 96 | 198 |
### MATH500
- **Accuracy**: 89.00%
| Accuracy | Questions Solved | Total Questions |
|----------|-----------------|----------------|
| 89.00% | 445 | 500 |
# mlfoundations-dev/DeepSeek-R1-Distill-Qwen-7B_eval_03-07-25_19-30_0981
用于评估的预计算模型输出。
## 评估结果
### 评估概览
| 指标 | AIME24 | AIME25 | AMC23 | GPQADiamond | MATH500 |
|--------|------|------|-----|-----------|-------|
| 准确率 | 52.0 | 38.0 | 89.5 | 32.7 | 89.0 |
### AIME24
- **平均准确率**:52.00% ± 4.77%
- **测试轮次**:5
| 轮次 | 准确率 | 答对题目数 | 总题目数 |
|-----|----------|-----------------|----------------|
| 1 | 50.00% | 15 | 30 |
| 2 | 50.00% | 15 | 30 |
| 3 | 53.33% | 16 | 30 |
| 4 | 70.00% | 21 | 30 |
| 5 | 36.67% | 11 | 30 |
### AIME25
- **平均准确率**:38.00% ± 2.23%
- **测试轮次**:5
| 轮次 | 准确率 | 答对题目数 | 总题目数 |
|-----|----------|-----------------|----------------|
| 1 | 36.67% | 11 | 30 |
| 2 | 43.33% | 13 | 30 |
| 3 | 36.67% | 11 | 30 |
| 4 | 30.00% | 9 | 30 |
| 5 | 43.33% | 13 | 30 |
### AMC23
- **平均准确率**:89.50% ± 0.84%
- **测试轮次**:5
| 轮次 | 准确率 | 答对题目数 | 总题目数 |
|-----|----------|-----------------|----------------|
| 1 | 87.50% | 35 | 40 |
| 2 | 92.50% | 37 | 40 |
| 3 | 90.00% | 36 | 40 |
| 4 | 90.00% | 36 | 40 |
| 5 | 87.50% | 35 | 40 |
### GPQADiamond
- **平均准确率**:32.66% ± 6.48%
- **测试轮次**:3
| 轮次 | 准确率 | 答对题目数 | 总题目数 |
|-----|----------|-----------------|----------------|
| 1 | 23.74% | 47 | 198 |
| 2 | 25.76% | 51 | 198 |
| 3 | 48.48% | 96 | 198 |
### MATH500
- **准确率**:89.00%
| 准确率 | 答对题目数 | 总题目数 |
|----------|-----------------|----------------|
| 89.00% | 445 | 500 |
提供机构:
maas
创建时间:
2025-10-03



