unlearning-cleanslate/generations-13-qwen3-8b-undial-baseline-target-100-checkpoint-1078
收藏Hugging Face2026-04-29 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/unlearning-cleanslate/generations-13-qwen3-8b-undial-baseline-target-100-checkpoint-1078
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含多个配置,主要用于评估语言模型在推理和问题解决任务中的性能。配置包括arc_challenge(AI推理挑战)和多个bbh_cot_fewshot_*任务(基于Big-Bench Hard的思维链少样本学习,涵盖布尔表达式、因果判断、日期理解、消歧问答、Dyck语言、形式谬误、几何形状、夸张、逻辑推理、电影推荐、多步算术、导航、对象计数、企鹅表格、彩色对象推理和名称破坏等主题)。每个配置的特征包括文档ID、文档内容(如问题、输入、目标、答案选项)、参数(如生成参数)、响应列表、过滤响应、指标、哈希值和分数,支持模型生成和评估。数据集仅包含训练分割,示例数量从146到1172不等,用于少样本或思维链学习场景。
This dataset includes multiple configurations designed to evaluate the performance of language models on reasoning and problem-solving tasks. Configurations consist of arc_challenge (AI reasoning challenge) and various bbh_cot_fewshot_* tasks (chain-of-thought few-shot learning based on Big-Bench Hard, covering topics such as boolean expressions, causal judgement, date understanding, disambiguation QA, Dyck languages, formal fallacies, geometric shapes, hyperbaton, logical deduction, movie recommendation, multistep arithmetic, navigation, object counting, penguins in a table, reasoning about colored objects, and ruin names). Each configuration features document ID, document content (e.g., questions, inputs, targets, answer choices), arguments (e.g., generation parameters), response lists, filtered responses, metrics, hash values, and scores, supporting model generation and evaluation. The dataset contains only training splits, with example counts ranging from 146 to 1172, intended for few-shot or chain-of-thought learning scenarios.
提供机构:
unlearning-cleanslate



