DeathMath-leaderboard-metainfo
收藏魔搭社区2025-11-12 更新2025-11-03 收录
下载链接:
https://modelscope.cn/datasets/Vikhrmodels/DeathMath-leaderboard-metainfo
下载链接
链接失效反馈官方服务:
资源简介:
# DeathMath Leaderboard
DeathMath - это бенчмарк для оценки способности моделей решать сложные математические и физические задачи на русском языке.
## Текущий лидерборд
Последнее обновление: 2025-10-30 12:02:16
| Модель | Общий балл | Математика | Физика | Токены | Время оценки |
|--------|------------|------------|---------|---------|--------------|
| AlexWortega/Gemeni 2.5 Pro | 0.728 | 0.874 | 0.582 | 2,227,721 | 4937.3s |
| Anonumous/GPT-5 | 0.705 | 0.910 | 0.500 | 1,374,085 | 4908.4s |
| o3-mini-high | 0.692 | 0.884 | 0.500 | 2,186,756 | 5107.5s |
| Anonumous/GPT-OSS-120B | 0.675 | 0.849 | 0.500 | 671,703 | 939.1s |
| o3 | 0.669 | 0.868 | 0.469 | 1,164,000 | 5025.8s |
| o4-mini-high | 0.664 | 0.868 | 0.459 | 1,997,548 | 5811.0s |
| Anonumous/GPT-5 Nano | 0.649 | 0.839 | 0.459 | 2,218,450 | 5109.7s |
| Anonumous/GPT-5 Mini | 0.634 | 0.849 | 0.418 | 993,326 | 3368.5s |
| Anonumous/Claude Opus 4.1 | 0.607 | 0.704 | 0.510 | 448,628 | 1927.2s |
| Anonumous/GPT-OSS-20B | 0.583 | 0.789 | 0.378 | 1,034,077 | 4009.3s |
| AlexWortega/Claude Sonnet 4 | 0.551 | 0.633 | 0.469 | 490,996 | 1294.7s |
| Qwen QwQ 32B | 0.530 | 0.653 | 0.408 | 2,112,951 | 16974.7s |
| Gemini 2.0 Flash | 0.514 | 0.558 | 0.469 | 495,313 | 736.6s |
| Claude 3.7 Sonnet | 0.470 | 0.542 | 0.398 | 405,583 | 1082.0s |
| gpt-4.1 | 0.466 | 0.584 | 0.347 | 549,983 | 2434.6s |
| LakoMoor/QVikhr-3-8B-Instruction | 0.445 | 0.563 | 0.327 | 1,486,327 | 11874.2s |
| LakoMoor/Qwen3-8B | 0.417 | 0.538 | 0.296 | 1,576,445 | 12744.3s |
| Gemma 3 27B | 0.400 | 0.474 | 0.327 | 384,164 | 3024.3s |
| Claude 3.5 Sonnet | 0.376 | 0.416 | 0.337 | 252,843 | 702.0s |
| DeepSeek R1 Distill Qwen 14B | 0.346 | 0.447 | 0.245 | 806,258 | 7904.1s |
| DeepSeek V3 0324 | 0.343 | 0.432 | 0.255 | 339,857 | 2901.8s |
| gpt-4o | 0.338 | 0.432 | 0.245 | 399,483 | 1145.0s |
| GigaChat-2-Max | 0.314 | 0.363 | 0.265 | 185,204 | 965.8s |
| AvitoTech/A-vibe | 0.280 | 0.367 | 0.194 | 797,771 | 4097.4s |
| GigaChat-2-Pro | 0.270 | 0.316 | 0.224 | 215,297 | 1250.3s |
| Qwen2.5 72B Instruct | 0.189 | 0.379 | 0.000 | 322,441 | 5670.7s |
| GigaChat-Max | 0.181 | 0.189 | 0.173 | 200,271 | 1056.5s |
| Gemma 3 4B | 0.180 | 0.258 | 0.102 | 726,285 | 2959.4s |
| GigaChat-2 | 0.083 | 0.095 | 0.071 | 136,051 | 576.9s |
## Как принять участие в бенчмарке
Для участия в бенчмарке DeathMath:
1. Клонируйте репозиторий и запустите тесты вашей модели
2. Загрузите результаты через [HuggingFace Space](https://huggingface.co/spaces/Vikhrmodels/DeathMath-leaderboard)
3. Дождитесь проверки и добавления результатов в лидерборд
## Формат результатов
Результаты должны быть в формате JSON со следующей структурой:
```json
{
"score": 0.586,
"math_score": 0.8,
"physics_score": 0.373,
"total_tokens": 1394299,
"evaluation_time": 4533.2,
"system_prompt": "Вы - полезный помощник по математике и физике. Ответьте на русском языке."
}
```
## Лицензия
Бенчмарк распространяется под лицензией Apache 2.0
# DeathMath 排行榜
DeathMath 是一款用于评估模型解决俄语复杂数学与物理问题能力的基准测试。
## 当前排行榜
最后更新时间:2025-10-30 12:02:16
| 模型 | 总得分 | 数学得分 | 物理得分 | 令牌数(Token) | 评估耗时 |
|--------|------------|------------|---------|---------|--------------|
| AlexWortega/Gemeni 2.5 Pro | 0.728 | 0.874 | 0.582 | 2,227,721 | 4937.3s |
| Anonumous/GPT-5 | 0.705 | 0.910 | 0.500 | 1,374,085 | 4908.4s |
| o3-mini-high | 0.692 | 0.884 | 0.500 | 2,186,756 | 5107.5s |
| Anonumous/GPT-OSS-120B | 0.675 | 0.849 | 0.500 | 671,703 | 939.1s |
| o3 | 0.669 | 0.868 | 0.469 | 1,164,000 | 5025.8s |
| o4-mini-high | 0.664 | 0.868 | 0.459 | 1,997,548 | 5811.0s |
| Anonumous/GPT-5 Nano | 0.649 | 0.839 | 0.459 | 2,218,450 | 5109.7s |
| Anonumous/GPT-5 Mini | 0.634 | 0.849 | 0.418 | 993,326 | 3368.5s |
| Anonumous/Claude Opus 4.1 | 0.607 | 0.704 | 0.510 | 448,628 | 1927.2s |
| Anonumous/GPT-OSS-20B | 0.583 | 0.789 | 0.378 | 1,034,077 | 4009.3s |
| AlexWortega/Claude Sonnet 4 | 0.551 | 0.633 | 0.469 | 490,996 | 1294.7s |
| Qwen QwQ 32B | 0.530 | 0.653 | 0.408 | 2,112,951 | 16974.7s |
| Gemini 2.0 Flash | 0.514 | 0.558 | 0.469 | 495,313 | 736.6s |
| Claude 3.7 Sonnet | 0.470 | 0.542 | 0.398 | 405,583 | 1082.0s |
| gpt-4.1 | 0.466 | 0.584 | 0.347 | 549,983 | 2434.6s |
| LakoMoor/QVikhr-3-8B-Instruction | 0.445 | 0.563 | 0.327 | 1,486,327 | 11874.2s |
| LakoMoor/Qwen3-8B | 0.417 | 0.538 | 0.296 | 1,576,445 | 12744.3s |
| Gemma 3 27B | 0.400 | 0.474 | 0.327 | 384,164 | 3024.3s |
| Claude 3.5 Sonnet | 0.376 | 0.416 | 0.337 | 252,843 | 702.0s |
| DeepSeek R1 Distill Qwen 14B | 0.346 | 0.447 | 0.245 | 806,258 | 7904.1s |
| DeepSeek V3 0324 | 0.343 | 0.432 | 0.255 | 339,857 | 2901.8s |
| gpt-4o | 0.338 | 0.432 | 0.245 | 399,483 | 1145.0s |
| GigaChat-2-Max | 0.314 | 0.363 | 0.265 | 185,204 | 965.8s |
| AvitoTech/A-vibe | 0.280 | 0.367 | 0.194 | 797,771 | 4097.4s |
| GigaChat-2-Pro | 0.270 | 0.316 | 0.224 | 215,297 | 1250.3s |
| Qwen2.5 72B Instruct | 0.189 | 0.379 | 0.000 | 322,441 | 5670.7s |
| GigaChat-Max | 0.181 | 0.189 | 0.173 | 200,271 | 1056.5s |
| Gemma 3 4B | 0.180 | 0.258 | 0.102 | 726,285 | 2959.4s |
| GigaChat-2 | 0.083 | 0.095 | 0.071 | 136,051 | 576.9s |
## 如何参与该基准测试
若要参与 DeathMath 基准测试,请遵循以下步骤:
1. 克隆对应仓库并运行针对您的模型的测试
2. 通过 [HuggingFace Space](https://huggingface.co/spaces/Vikhrmodels/DeathMath-leaderboard) 上传测试结果
3. 等待审核通过后,您的结果将被添加至排行榜
## 结果格式
测试结果需采用如下 JSON 结构:
json
{
"score": 0.586,
"math_score": 0.8,
"physics_score": 0.373,
"total_tokens": 1394299,
"evaluation_time": 4533.2,
"system_prompt": "Вы - полезный помощник по математике и физике. Ответьте на русском языке."
}
## 许可证
该基准测试采用 Apache 2.0 许可证进行分发。
提供机构:
maas
创建时间:
2025-09-19



