five

wzekai99/ORCA

收藏
Hugging Face2026-04-27 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/wzekai99/ORCA
下载链接
链接失效反馈
官方服务:
资源简介:
ORCA步骤级嵌入和标签数据集是为论文《在线推理校准:测试时训练实现可泛化的保形LLM推理》提供的预处理步骤嵌入和标签数据。对于每个上游数据集中的每个问题,使用DeepSeek-R1-671B生成单一推理轨迹,从目标大型语言模型(LLM)的每个推理步骤中提取平均池化的最后一层隐藏状态,并由Qwen3-32B教师模型生成两个标签集:一个是有监督的正确性标签,另一个是无标签的一致性标签,用于比较中间答案与完整预算答案。该数据集支持训练、校准和测试分割,并包含多个上游数据集(如s1K、OpenR1-Math、DeepMath-103K、MATH-500、GPQA-Diamond、AIME 2024/2025/2026)的嵌入和标签,用于推理校准研究。

ORCA Step-Level Embeddings and Labels dataset provides preprocessed step embeddings and step labels for the paper Online Reasoning Calibration: Test-Time Training Enables Generalizable Conformal LLM Reasoning. For every problem in each upstream dataset, a single reasoning trajectory is generated with DeepSeek-R1-671B, mean-pooled last-layer hidden states are extracted from the target LLM at every reasoning step, and two label sets are produced from a Qwen3-32B teacher: a supervised correctness label and a label-free consistent label that compares the intermediate answer to the full-budget answer. The dataset includes embeddings and labels for multiple upstream datasets (e.g., s1K, OpenR1-Math, DeepMath-103K, MATH-500, GPQA-Diamond, AIME 2024/2025/2026) with train/calibration/test splits for reasoning calibration research.
提供机构:
wzekai99
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作