kth8/Qwen3.5-9B-MMLU-Pro-benchmark
收藏Hugging Face2026-04-10 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/kth8/Qwen3.5-9B-MMLU-Pro-benchmark
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
language:
- en
base_model: Qwen/Qwen3.5-9B
datasets:
- TIGER-Lab/MMLU-Pro
---
Benchmark of [Qwen/Qwen3.5-9B](https://huggingface.co/Qwen/Qwen3.5-9B) against [TIGER-Lab/MMLU-Pro](https://huggingface.co/datasets/TIGER-Lab/MMLU-Pro) dataset.
Accuracy: 77.8% with Python tool.
| Metric | Value |
|----------------------|---------------|
| **Correct** | 778 |
| **Incorrect** | 222 |
| **Errors** | 0 |
| **Total samples** | 1000 |
| **Python tool calls**| 1089 |
| **Total completion tokens** | 1,785,437 |
Raw stats:
```json
{
"accuracy": 0.778,
"correct": 778,
"incorrect": 222,
"error": 0,
"total": 1000,
"python_tool_calls": 1089,
"completion_tokens": 1785437
}
```
许可证:Apache-2.0
语言:英语
基础模型:Qwen/Qwen3.5-9B
评测数据集:
- TIGER-Lab/MMLU-Pro
本基准测试针对[Qwen/Qwen3.5-9B](https://huggingface.co/Qwen/Qwen3.5-9B)与[TIGER-Lab/MMLU-Pro](https://huggingface.co/datasets/TIGER-Lab/MMLU-Pro)数据集展开对比评测。
使用Python工具时,准确率达77.8%。
| 评测指标 | 数值 |
|----------------------|---------------|
| **正确样本数** | 778 |
| **错误样本数** | 222 |
| **异常错误数** | 0 |
| **总样本量** | 1000 |
| **Python工具调用次数**| 1089 |
| **总补全Token(Token)数** | 1,785,437 |
原始统计信息:
json
{
"accuracy": 0.778,
"correct": 778,
"incorrect": 222,
"error": 0,
"total": 1000,
"python_tool_calls": 1089,
"completion_tokens": 1785437
}
提供机构:
kth8



