five

unlearning-cleanslate/generations-12-llama-3_1-8b-undial-baseline-target-100-checkpoint-1722

收藏
Hugging Face2026-04-29 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/unlearning-cleanslate/generations-12-llama-3_1-8b-undial-baseline-target-100-checkpoint-1722
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集包含多个配置,用于评估语言模型在多种推理和常识任务上的性能。主要配置包括:ARC挑战(一个问答数据集,涉及科学问题,包含答案键、选项和问题);以及多个BBH(BIG-Bench Hard)思维链(CoT)少样本任务,涵盖布尔表达式、因果判断、日期理解、消歧问答、Dyck语言、形式谬误、几何形状、超常语序、逻辑推理(涉及三、五、七个对象)、电影推荐、多步算术、导航、对象计数、企鹅表格、彩色对象推理和名称破坏等任务。每个配置的特征包括文档ID、输入文本、目标输出、生成参数(如采样设置、最大生成标记数、温度等)、模型响应、过滤后的响应、过滤方法、评估指标、哈希值和分数。数据集用于训练和评估语言模型的推理能力,支持少样本学习设置,并可能用于基准测试或模型微调。数据规模从数百到数千个示例不等,总下载大小约数MB。

This dataset contains multiple configurations for evaluating the performance of language models across diverse reasoning and commonsense tasks. The main configurations include: ARC Challenge, a question answering dataset focused on scientific questions that encompasses answer keys, options, and questions; plus multiple BIG-Bench Hard (BBH) Chain-of-Thought (CoT) few-shot tasks, covering boolean expressions, causal judgment, date understanding, disambiguated question answering, Dyck languages, formal fallacies, geometric shapes, anomalous word order, logical reasoning involving three, five, and seven objects respectively, movie recommendation, multi-step arithmetic, navigation, object counting, penguin tables, colored object reasoning, and name destruction tasks. Each configuration includes features such as document ID, input text, target output, generation parameters (e.g., sampling settings, maximum generated tokens, temperature, etc.), model responses, filtered responses, filtering methods, evaluation metrics, hash values, and scores. This dataset is utilized for training and evaluating the reasoning capabilities of language models, supports few-shot learning settings, and can be employed for benchmarking or model fine-tuning. The dataset ranges from hundreds to thousands of examples, with a total download size of approximately several megabytes (MB).
提供机构:
unlearning-cleanslate
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作