kth8/Qwen3.5-4B-Claude-Opus-Reasoning-Distill-GPQA-Diamond-benchmark
收藏Hugging Face2026-04-10 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/kth8/Qwen3.5-4B-Claude-Opus-Reasoning-Distill-GPQA-Diamond-benchmark
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
language:
- en
base_model: TeichAI/Qwen3.5-4B-Claude-Opus-Reasoning-Distill
datasets:
- fingertap/GPQA-Diamond
---
Benchmark of [TeichAI/Qwen3.5-4B-Claude-Opus-Reasoning-Distill](https://huggingface.co/TeichAI/Qwen3.5-4B-Claude-Opus-Reasoning-Distill) against [fingertap/GPQA-Diamond](https://huggingface.co/datasets/fingertap/GPQA-Diamond) dataset.
Accuracy: 60.099999999999994% with Python tool.
| Metric | Value |
|----------------------|---------------|
| **Correct** | 119 |
| **Incorrect** | 78 |
| **Errors** | 1 |
| **Total samples** | 198 |
| **Python tool calls**| 189 |
| **Total completion tokens** | 871,865 |
Raw stats:
```json
{
"accuracy": 0.601,
"correct": 119,
"incorrect": 78,
"error": 1,
"total": 198,
"python_tool_calls": 189,
"completion_tokens": 871865
}
```
提供机构:
kth8



