raca-workspace-v1/algo-sft-eval-traces-cellular-automata-distill-qwq-v4
收藏Hugging Face2026-04-03 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/raca-workspace-v1/algo-sft-eval-traces-cellular-automata-distill-qwq-v4
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
tags:
- algo-sft-eval-redo
- cellular_automata
- distill
---
# algo-sft-eval-traces-cellular-automata-distill-qwq-v4
Full eval traces for algo-sft-cellular-automata-distill-qwq across test/harder/ood splits
## Dataset Info
- **Rows**: 2000
- **Columns**: 11
## Columns
| Column | Type | Description |
|--------|------|-------------|
| question_id | Value('string') | Unique question identifier from eval set |
| split | Value('string') | Evaluation split: test (in-distribution), harder (scaled up), ood (structural out-of-distribution) |
| domain | Value('string') | Task domain: formal_logic, conlang_morphology, cellular_automata, long_arithmetic |
| task | Value('string') | Specific task variant (e.g., formal_logic_bottom_up) |
| prompt | Value('string') | Full prompt sent to the model |
| model_response | Value('string') | Complete untruncated model output |
| extracted_answer | Value('string') | Answer extracted by domain-specific parser |
| ground_truth | Value('string') | Expected correct answer |
| correct | Value('bool') | Whether extracted_answer matched ground_truth |
| finish_reason | Value('string') | vLLM finish reason: stop (natural end) or length (hit max_tokens) |
| token_count | Value('int64') | Number of tokens in model_response |
## Generation Parameters
```json
{
"script_name": "eval_model.py",
"model": "reasoning-degeneration-dev/algo-sft-cellular-automata-distill-qwq",
"description": "Full eval traces for algo-sft-cellular-automata-distill-qwq across test/harder/ood splits",
"hyperparameters": {
"max_tokens": 32768,
"max_model_len": 32768,
"temperature": 0.0,
"base_model": "Qwen/Qwen2.5-1.5B-Instruct"
},
"input_datasets": []
}
```
## Usage
```python
from datasets import load_dataset
dataset = load_dataset("raca-workspace-v1/algo-sft-eval-traces-cellular-automata-distill-qwq-v4", split="train")
print(f"Loaded {len(dataset)} rows")
```
---
*Uploaded via [RACA](https://github.com/Zayne-sprague/Dr-Claude-Code) hf_utility.*
许可证:MIT
标签:
- algo-sft-eval-redo
- cellular_automata(细胞自动机(Cellular Automata))
# algo-sft-eval-traces-cellular-automata-distill-qwq-v4
该数据集包含针对`algo-sft-cellular-automata-distill-qwq`模型在测试(test)、困难(harder)、分布外(Out-of-Distribution,ood)三个数据集划分下的完整评估轨迹。
## 数据集信息
- **行数**:2000
- **列数**:11
## 列信息
| 列名 | 数据类型 | 描述 |
|--------|------|-------------|
| question_id | 字符串类型 | 评估集内唯一的问题标识符 |
| split | 字符串类型 | 评估数据集划分:test(分布内测试集)、harder(缩放难度升级版)、ood(结构分布外测试集(Out-of-Distribution)) |
| domain | 字符串类型 | 任务领域:形式逻辑(formal_logic)、人造语言形态学(conlang_morphology)、细胞自动机(Cellular Automata)、长算术(long_arithmetic) |
| task | 字符串类型 | 具体任务变体(例如:formal_logic_bottom_up) |
| prompt | 字符串类型 | 发送至模型的完整提示词 |
| model_response | 字符串类型 | 模型完整未截断输出结果 |
| extracted_answer | 字符串类型 | 通过领域专属解析器提取的答案 |
| ground_truth | 字符串类型 | 预期的正确标准答案 |
| correct | 布尔类型 | 提取的答案是否与标准答案匹配 |
| finish_reason | 字符串类型 | vLLM生成终止原因:stop(自然终止)或length(达到max_tokens上限) |
| token_count | 64位整型 | 模型输出中的Token数量 |
## 生成参数
json
{
"脚本名称": "eval_model.py",
"模型": "reasoning-degeneration-dev/algo-sft-cellular-automata-distill-qwq",
"描述": "针对algo-sft-cellular-automata-distill-qwq模型在test/harder/ood三个划分下的完整评估轨迹",
"超参数": {
"最大Token数": 32768,
"模型最大上下文长度": 32768,
"温度系数": 0.0,
"基础模型": "Qwen/Qwen2.5-1.5B-Instruct"
},
"输入数据集": []
}
## 使用方法
python
from datasets import load_dataset
dataset = load_dataset("raca-workspace-v1/algo-sft-eval-traces-cellular-automata-distill-qwq-v4", split="train")
print(f"已加载 {len(dataset)} 条数据")
*通过[RACA](https://github.com/Zayne-sprague/Dr-Claude-Code)的hf_utility工具上传。*
提供机构:
raca-workspace-v1



