unlearning-cleanslate/generations-03-gemma-3-12b-simnpo-gentle-baseline-target-100-checkpoint-1419
收藏Hugging Face2026-04-29 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/unlearning-cleanslate/generations-03-gemma-3-12b-simnpo-gentle-baseline-target-100-checkpoint-1419
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含多个配置,用于评估语言模型在多样推理任务上的性能,特别是思维链(Chain-of-Thought)少样本学习。主要配置包括:1) arc_challenge:AI推理挑战数据集,包含问题、答案选项和正确答案,用于测试模型推理能力;2) bbh_cot_fewshot_*系列:基于Big-Bench Hard任务的少样本思维链数据集,涵盖布尔表达式、因果判断、日期理解、消歧问答、Dyck语言、形式谬误、几何形状、超序排列、逻辑演绎(三、五、七对象)、电影推荐、多步算术、导航、对象计数、企鹅表格、颜色对象推理和名字破坏等任务。每个样本包括输入(input)、目标(target)、生成参数(如采样设置)、模型响应(resps)、过滤响应和评分。数据集用于训练和评估模型在复杂推理任务中的表现,支持少样本和思维链提示方法。
This dataset includes multiple configurations for evaluating language model performance on diverse reasoning tasks, particularly in chain-of-thought few-shot learning. Key configurations are: 1) arc_challenge: An AI reasoning challenge dataset containing questions, answer choices, and correct answers to test model reasoning capabilities; 2) bbh_cot_fewshot_* series: Few-shot chain-of-thought datasets based on Big-Bench Hard tasks, covering boolean expressions, causal judgement, date understanding, disambiguation QA, Dyck languages, formal fallacies, geometric shapes, hyperbaton, logical deduction (three, five, seven objects), movie recommendation, multistep arithmetic, navigation, object counting, penguins in a table, reasoning about colored objects, and ruin names. Each sample includes input, target, generation arguments (e.g., sampling settings), model responses (resps), filtered responses, and scores. The dataset is used for training and evaluating models on complex reasoning tasks, supporting few-shot and chain-of-thought prompting methods.
提供机构:
unlearning-cleanslate



