five

unlearning-cleanslate/generations-gemma-3-12b-simnpo-gentle-baseline

收藏
Hugging Face2026-04-29 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/unlearning-cleanslate/generations-gemma-3-12b-simnpo-gentle-baseline
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集包含多个配置,主要用于评估语言模型在多种推理和问题解决任务上的性能。核心配置包括ARC挑战赛(ARC Challenge)和BBH(Big-Bench Hard)思维链少样本任务(如布尔表达式、因果判断、日期理解、消歧问答、Dyck语言、形式谬误、几何形状、超序数、逻辑推理(三、五、七对象)、电影推荐、多步算术、导航、对象计数、企鹅表格、彩色对象推理、名字毁坏等)。每个配置包含结构化特征,如问题、答案、选择、生成参数、模型响应、过滤响应、评估指标和分数,旨在测试模型在复杂任务上的少样本学习和推理能力。数据集分为训练集,提供示例数量和大小信息。

This dataset includes multiple configurations designed for evaluating language model performance on various reasoning and problem-solving tasks. Key configurations encompass the ARC Challenge and BBH (Big-Bench Hard) chain-of-thought few-shot tasks (e.g., boolean expressions, causal judgement, date understanding, disambiguation QA, Dyck languages, formal fallacies, geometric shapes, hyperbaton, logical deduction with three, five, and seven objects, movie recommendation, multistep arithmetic, navigation, object counting, penguins in a table, reasoning about colored objects, ruin names, etc.). Each configuration features structured elements such as questions, answers, choices, generation arguments, model responses, filtered responses, evaluation metrics, and scores, aiming to test few-shot learning and reasoning capabilities in complex scenarios. The dataset is split into training sets with example counts and size details.
提供机构:
unlearning-cleanslate
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作