raca-workspace-v1/algo-sft-eval-baseline-Qwen2.5-1.5B-Instruct-v4
收藏Hugging Face2026-04-04 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/raca-workspace-v1/algo-sft-eval-baseline-Qwen2.5-1.5B-Instruct-v4
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
tags:
- algo-sft-eval-redo
- baseline
---
# algo-sft-eval-baseline-Qwen2.5-1.5B-Instruct-v4
Baseline eval traces for Qwen2.5-1.5B-Instruct across all 4 domains × 3 splits
## Dataset Info
- **Rows**: 8000
- **Columns**: 10
## Columns
| Column | Type | Description |
|--------|------|-------------|
| question_id | Value('string') | Unique question identifier from eval set |
| domain | Value('string') | Task domain |
| split | Value('string') | Evaluation split: test, harder, ood |
| prompt | Value('string') | Full prompt sent to the model |
| model_response | Value('string') | Complete untruncated model output |
| extracted_answer | Value('string') | Answer extracted by domain-specific parser |
| ground_truth | Value('string') | Expected correct answer |
| correct | Value('bool') | Whether extracted_answer matched ground_truth |
| finish_reason | Value('string') | vLLM finish reason: stop or length |
| token_count | Value('int64') | Number of tokens in model_response |
## Generation Parameters
```json
{
"script_name": "eval_baseline.py",
"model": "Qwen2.5-1.5B-Instruct",
"description": "Baseline eval traces for Qwen2.5-1.5B-Instruct across all 4 domains \u00d7 3 splits",
"hyperparameters": {
"max_tokens": 32768,
"max_model_len": 32768,
"temperature": 0.0
},
"input_datasets": []
}
```
## Usage
```python
from datasets import load_dataset
dataset = load_dataset("raca-workspace-v1/algo-sft-eval-baseline-Qwen2.5-1.5B-Instruct-v4", split="train")
print(f"Loaded {len(dataset)} rows")
```
---
*Uploaded via [RACA](https://github.com/Zayne-sprague/Dr-Claude-Code) hf_utility.*
提供机构:
raca-workspace-v1



