five

DeepSeek-R1-Distill-Qwen-7B_eval_03-07-25_19-30_0981

收藏
魔搭社区2025-12-03 更新2025-11-22 收录
下载链接:
https://modelscope.cn/datasets/mlfoundations-dev/DeepSeek-R1-Distill-Qwen-7B_eval_03-07-25_19-30_0981
下载链接
链接失效反馈
官方服务:
资源简介:
# mlfoundations-dev/DeepSeek-R1-Distill-Qwen-7B_eval_03-07-25_19-30_0981 Precomputed model outputs for evaluation. ## Evaluation Results ### Summary | Metric | AIME24 | AIME25 | AMC23 | GPQADiamond | MATH500 | |--------|------|------|-----|-----------|-------| | Accuracy | 52.0 | 38.0 | 89.5 | 32.7 | 89.0 | ### AIME24 - **Average Accuracy**: 52.00% ± 4.77% - **Number of Runs**: 5 | Run | Accuracy | Questions Solved | Total Questions | |-----|----------|-----------------|----------------| | 1 | 50.00% | 15 | 30 | | 2 | 50.00% | 15 | 30 | | 3 | 53.33% | 16 | 30 | | 4 | 70.00% | 21 | 30 | | 5 | 36.67% | 11 | 30 | ### AIME25 - **Average Accuracy**: 38.00% ± 2.23% - **Number of Runs**: 5 | Run | Accuracy | Questions Solved | Total Questions | |-----|----------|-----------------|----------------| | 1 | 36.67% | 11 | 30 | | 2 | 43.33% | 13 | 30 | | 3 | 36.67% | 11 | 30 | | 4 | 30.00% | 9 | 30 | | 5 | 43.33% | 13 | 30 | ### AMC23 - **Average Accuracy**: 89.50% ± 0.84% - **Number of Runs**: 5 | Run | Accuracy | Questions Solved | Total Questions | |-----|----------|-----------------|----------------| | 1 | 87.50% | 35 | 40 | | 2 | 92.50% | 37 | 40 | | 3 | 90.00% | 36 | 40 | | 4 | 90.00% | 36 | 40 | | 5 | 87.50% | 35 | 40 | ### GPQADiamond - **Average Accuracy**: 32.66% ± 6.48% - **Number of Runs**: 3 | Run | Accuracy | Questions Solved | Total Questions | |-----|----------|-----------------|----------------| | 1 | 23.74% | 47 | 198 | | 2 | 25.76% | 51 | 198 | | 3 | 48.48% | 96 | 198 | ### MATH500 - **Accuracy**: 89.00% | Accuracy | Questions Solved | Total Questions | |----------|-----------------|----------------| | 89.00% | 445 | 500 |

# mlfoundations-dev/DeepSeek-R1-Distill-Qwen-7B_eval_03-07-25_19-30_0981 用于评估的预计算模型输出。 ## 评估结果 ### 评估概览 | 指标 | AIME24 | AIME25 | AMC23 | GPQADiamond | MATH500 | |--------|------|------|-----|-----------|-------| | 准确率 | 52.0 | 38.0 | 89.5 | 32.7 | 89.0 | ### AIME24 - **平均准确率**:52.00% ± 4.77% - **测试轮次**:5 | 轮次 | 准确率 | 答对题目数 | 总题目数 | |-----|----------|-----------------|----------------| | 1 | 50.00% | 15 | 30 | | 2 | 50.00% | 15 | 30 | | 3 | 53.33% | 16 | 30 | | 4 | 70.00% | 21 | 30 | | 5 | 36.67% | 11 | 30 | ### AIME25 - **平均准确率**:38.00% ± 2.23% - **测试轮次**:5 | 轮次 | 准确率 | 答对题目数 | 总题目数 | |-----|----------|-----------------|----------------| | 1 | 36.67% | 11 | 30 | | 2 | 43.33% | 13 | 30 | | 3 | 36.67% | 11 | 30 | | 4 | 30.00% | 9 | 30 | | 5 | 43.33% | 13 | 30 | ### AMC23 - **平均准确率**:89.50% ± 0.84% - **测试轮次**:5 | 轮次 | 准确率 | 答对题目数 | 总题目数 | |-----|----------|-----------------|----------------| | 1 | 87.50% | 35 | 40 | | 2 | 92.50% | 37 | 40 | | 3 | 90.00% | 36 | 40 | | 4 | 90.00% | 36 | 40 | | 5 | 87.50% | 35 | 40 | ### GPQADiamond - **平均准确率**:32.66% ± 6.48% - **测试轮次**:3 | 轮次 | 准确率 | 答对题目数 | 总题目数 | |-----|----------|-----------------|----------------| | 1 | 23.74% | 47 | 198 | | 2 | 25.76% | 51 | 198 | | 3 | 48.48% | 96 | 198 | ### MATH500 - **准确率**:89.00% | 准确率 | 答对题目数 | 总题目数 | |----------|-----------------|----------------| | 89.00% | 445 | 500 |
提供机构:
maas
创建时间:
2025-10-03
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作