unlearning-cleanslate/generations-simnpo_olmo-3-1025-7b_20260428_064411-debug-post-olmo
收藏Hugging Face2026-04-29 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/unlearning-cleanslate/generations-simnpo_olmo-3-1025-7b_20260428_064411-debug-post-olmo
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含多个配置,主要涉及AI推理和评估任务。核心配置包括:1) arc_challenge:可能是一个挑战性问答数据集,用于测试AI的推理能力;2) 多个bbh_cot_fewshot_*变体:这些配置覆盖广泛任务,如布尔表达式、因果判断、日期理解、消歧问答、Dyck语言、形式谬误、几何形状、超序数、逻辑推理(涉及三、五、七个对象)、电影推荐、多步算术、导航、对象计数、表格中的企鹅、彩色物体推理和名称破坏等,均设计为少样本学习(few-shot)和思维链(chain-of-thought)格式,用于评估语言模型在复杂推理任务上的表现。数据集结构包含输入、目标、生成参数、响应和评估指标等字段。
This dataset includes multiple configurations focused on AI reasoning and evaluation tasks. Key configurations are: 1) arc_challenge: likely a challenging question-answering dataset for testing AI reasoning capabilities; 2) various bbh_cot_fewshot_* variants: these cover a broad range of tasks such as boolean expressions, causal judgement, date understanding, disambiguation QA, Dyck languages, formal fallacies, geometric shapes, hyperbaton, logical deduction (with three, five, and seven objects), movie recommendation, multistep arithmetic, navigation, object counting, penguins in a table, reasoning about colored objects, and ruin names, all designed in few-shot and chain-of-thought formats to evaluate language models on complex reasoning tasks. The dataset structure includes fields like input, target, generation arguments, responses, and evaluation metrics.
提供机构:
unlearning-cleanslate



