raca-workspace-v1/algo-sft-eval-traces-cellular-automata-distill-qwq-v4

Name: raca-workspace-v1/algo-sft-eval-traces-cellular-automata-distill-qwq-v4
Creator: raca-workspace-v1
Published: 2026-04-03 00:54:14
License: 暂无描述

Hugging Face2026-04-03 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/raca-workspace-v1/algo-sft-eval-traces-cellular-automata-distill-qwq-v4

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit tags: - algo-sft-eval-redo - cellular_automata - distill --- # algo-sft-eval-traces-cellular-automata-distill-qwq-v4 Full eval traces for algo-sft-cellular-automata-distill-qwq across test/harder/ood splits ## Dataset Info - **Rows**: 2000 - **Columns**: 11 ## Columns | Column | Type | Description | |--------|------|-------------| | question_id | Value('string') | Unique question identifier from eval set | | split | Value('string') | Evaluation split: test (in-distribution), harder (scaled up), ood (structural out-of-distribution) | | domain | Value('string') | Task domain: formal_logic, conlang_morphology, cellular_automata, long_arithmetic | | task | Value('string') | Specific task variant (e.g., formal_logic_bottom_up) | | prompt | Value('string') | Full prompt sent to the model | | model_response | Value('string') | Complete untruncated model output | | extracted_answer | Value('string') | Answer extracted by domain-specific parser | | ground_truth | Value('string') | Expected correct answer | | correct | Value('bool') | Whether extracted_answer matched ground_truth | | finish_reason | Value('string') | vLLM finish reason: stop (natural end) or length (hit max_tokens) | | token_count | Value('int64') | Number of tokens in model_response | ## Generation Parameters ```json { "script_name": "eval_model.py", "model": "reasoning-degeneration-dev/algo-sft-cellular-automata-distill-qwq", "description": "Full eval traces for algo-sft-cellular-automata-distill-qwq across test/harder/ood splits", "hyperparameters": { "max_tokens": 32768, "max_model_len": 32768, "temperature": 0.0, "base_model": "Qwen/Qwen2.5-1.5B-Instruct" }, "input_datasets": [] } ``` ## Usage ```python from datasets import load_dataset dataset = load_dataset("raca-workspace-v1/algo-sft-eval-traces-cellular-automata-distill-qwq-v4", split="train") print(f"Loaded {len(dataset)} rows") ``` --- *Uploaded via [RACA](https://github.com/Zayne-sprague/Dr-Claude-Code) hf_utility.*

许可证：MIT 标签： - algo-sft-eval-redo - cellular_automata（细胞自动机(Cellular Automata)） # algo-sft-eval-traces-cellular-automata-distill-qwq-v4 该数据集包含针对`algo-sft-cellular-automata-distill-qwq`模型在测试（test）、困难（harder）、分布外(Out-of-Distribution，ood)三个数据集划分下的完整评估轨迹。 ## 数据集信息 - **行数**：2000 - **列数**：11 ## 列信息 | 列名 | 数据类型 | 描述 | |--------|------|-------------| | question_id | 字符串类型 | 评估集内唯一的问题标识符 | | split | 字符串类型 | 评估数据集划分：test（分布内测试集）、harder（缩放难度升级版）、ood（结构分布外测试集(Out-of-Distribution)） | | domain | 字符串类型 | 任务领域：形式逻辑(formal_logic)、人造语言形态学(conlang_morphology)、细胞自动机(Cellular Automata)、长算术(long_arithmetic) | | task | 字符串类型 | 具体任务变体（例如：formal_logic_bottom_up） | | prompt | 字符串类型 | 发送至模型的完整提示词 | | model_response | 字符串类型 | 模型完整未截断输出结果 | | extracted_answer | 字符串类型 | 通过领域专属解析器提取的答案 | | ground_truth | 字符串类型 | 预期的正确标准答案 | | correct | 布尔类型 | 提取的答案是否与标准答案匹配 | | finish_reason | 字符串类型 | vLLM生成终止原因：stop（自然终止）或length（达到max_tokens上限） | | token_count | 64位整型 | 模型输出中的Token数量 | ## 生成参数 json { "脚本名称": "eval_model.py", "模型": "reasoning-degeneration-dev/algo-sft-cellular-automata-distill-qwq", "描述": "针对algo-sft-cellular-automata-distill-qwq模型在test/harder/ood三个划分下的完整评估轨迹", "超参数": { "最大Token数": 32768, "模型最大上下文长度": 32768, "温度系数": 0.0, "基础模型": "Qwen/Qwen2.5-1.5B-Instruct" }, "输入数据集": [] } ## 使用方法 python from datasets import load_dataset dataset = load_dataset("raca-workspace-v1/algo-sft-eval-traces-cellular-automata-distill-qwq-v4", split="train") print(f"已加载 {len(dataset)} 条数据") *通过[RACA](https://github.com/Zayne-sprague/Dr-Claude-Code)的hf_utility工具上传。*

提供机构：

raca-workspace-v1

5,000+

优质数据集

54 个

任务类型

进入经典数据集