wzekai99/ORCA
收藏Hugging Face2026-04-27 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/wzekai99/ORCA
下载链接
链接失效反馈官方服务:
资源简介:
ORCA步骤级嵌入和标签数据集是为论文《在线推理校准:测试时训练实现可泛化的保形LLM推理》提供的预处理步骤嵌入和标签数据。对于每个上游数据集中的每个问题,使用DeepSeek-R1-671B生成单一推理轨迹,从目标大型语言模型(LLM)的每个推理步骤中提取平均池化的最后一层隐藏状态,并由Qwen3-32B教师模型生成两个标签集:一个是有监督的正确性标签,另一个是无标签的一致性标签,用于比较中间答案与完整预算答案。该数据集支持训练、校准和测试分割,并包含多个上游数据集(如s1K、OpenR1-Math、DeepMath-103K、MATH-500、GPQA-Diamond、AIME 2024/2025/2026)的嵌入和标签,用于推理校准研究。
ORCA Step-Level Embeddings and Labels dataset provides preprocessed step embeddings and step labels for the paper Online Reasoning Calibration: Test-Time Training Enables Generalizable Conformal LLM Reasoning. For every problem in each upstream dataset, a single reasoning trajectory is generated with DeepSeek-R1-671B, mean-pooled last-layer hidden states are extracted from the target LLM at every reasoning step, and two label sets are produced from a Qwen3-32B teacher: a supervised correctness label and a label-free consistent label that compares the intermediate answer to the full-budget answer. The dataset includes embeddings and labels for multiple upstream datasets (e.g., s1K, OpenR1-Math, DeepMath-103K, MATH-500, GPQA-Diamond, AIME 2024/2025/2026) with train/calibration/test splits for reasoning calibration research.
提供机构:
wzekai99



