unlearning-cleanslate/generations-olmo-3-7b-simnpo-gentle-bm25-10b
收藏Hugging Face2026-04-29 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/unlearning-cleanslate/generations-olmo-3-7b-simnpo-gentle-bm25-10b
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含多个配置,主要用于评估语言模型在多种推理任务上的性能,特别是基于思维链(Chain-of-Thought)的少样本学习。配置包括ARC挑战题(ARC Challenge)和BBH(Big-Bench Hard)的多个任务变体,如布尔表达式、因果判断、日期理解、歧义问答、Dyck语言、形式谬误、几何形状、夸张法、逻辑演绎(涉及三个、五个和七个对象)、电影推荐、多步算术、导航、对象计数、企鹅表格、彩色物体推理和名字毁损等。每个配置包含训练集,特征包括文档ID、输入文本、目标答案、生成参数、模型响应、过滤响应、哈希值和评分等,用于评估模型生成和推理能力。
This dataset includes multiple configurations designed to evaluate language model performance on various reasoning tasks, particularly focusing on few-shot learning with Chain-of-Thought (CoT). Configurations encompass ARC Challenge and multiple BBH (Big-Bench Hard) task variants, such as boolean expressions, causal judgement, date understanding, disambiguation QA, Dyck languages, formal fallacies, geometric shapes, hyperbaton, logical deduction (involving three, five, and seven objects), movie recommendation, multistep arithmetic, navigation, object counting, penguins in a table, reasoning about colored objects, and ruin names. Each configuration contains a training split with features like document ID, input text, target answer, generation arguments, model responses, filtered responses, hash values, and scores, aimed at assessing model generation and reasoning capabilities.
提供机构:
unlearning-cleanslate



