wzekai99/ORCA

Name: wzekai99/ORCA
Creator: wzekai99
Published: 2026-04-27 04:54:41
License: 暂无描述

Hugging Face2026-04-27 更新2026-05-03 收录

下载链接：

https://hf-mirror.com/datasets/wzekai99/ORCA

下载链接

链接失效反馈

官方服务：

资源简介：

ORCA步骤级嵌入和标签数据集是为论文《在线推理校准：测试时训练实现可泛化的保形LLM推理》提供的预处理步骤嵌入和标签数据。对于每个上游数据集中的每个问题，使用DeepSeek-R1-671B生成单一推理轨迹，从目标大型语言模型（LLM）的每个推理步骤中提取平均池化的最后一层隐藏状态，并由Qwen3-32B教师模型生成两个标签集：一个是有监督的正确性标签，另一个是无标签的一致性标签，用于比较中间答案与完整预算答案。该数据集支持训练、校准和测试分割，并包含多个上游数据集（如s1K、OpenR1-Math、DeepMath-103K、MATH-500、GPQA-Diamond、AIME 2024/2025/2026）的嵌入和标签，用于推理校准研究。

ORCA Step-Level Embeddings and Labels dataset provides preprocessed step embeddings and step labels for the paper Online Reasoning Calibration: Test-Time Training Enables Generalizable Conformal LLM Reasoning. For every problem in each upstream dataset, a single reasoning trajectory is generated with DeepSeek-R1-671B, mean-pooled last-layer hidden states are extracted from the target LLM at every reasoning step, and two label sets are produced from a Qwen3-32B teacher: a supervised correctness label and a label-free consistent label that compares the intermediate answer to the full-budget answer. The dataset includes embeddings and labels for multiple upstream datasets (e.g., s1K, OpenR1-Math, DeepMath-103K, MATH-500, GPQA-Diamond, AIME 2024/2025/2026) with train/calibration/test splits for reasoning calibration research.

提供机构：

wzekai99

5,000+

优质数据集

54 个

任务类型

进入经典数据集