unlearning-cleanslate/generations-08-llama-3_1-8b-simnpo-gentle-igm-10b-target-100-checkpoint-378
收藏Hugging Face2026-04-29 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/unlearning-cleanslate/generations-08-llama-3_1-8b-simnpo-gentle-igm-10b-target-100-checkpoint-378
下载链接
链接失效反馈官方服务:
资源简介:
该数据集集合包含多个配置,主要用于评估语言模型在少样本思维链(CoT)提示下的推理能力。核心配置包括:1) arc_challenge:包含1172个训练示例,特征包括问题、答案选项、答案键和模型响应,可能基于AI2推理挑战赛,测试常识推理。2) 多个bbh_cot_fewshot_*配置(如布尔表达式、因果判断、日期理解等),每个约250个训练示例,涵盖逻辑推理、数学、语言理解等任务,特征包括输入、目标答案、模型生成参数、过滤响应和评分。这些数据集旨在通过结构化提示和响应评估模型在复杂任务上的性能,支持AI基准测试和研究。
This dataset collection includes multiple configurations designed to evaluate the reasoning capabilities of language models under few-shot chain-of-thought (CoT) prompting. Key configurations are: 1) arc_challenge: Contains 1172 training examples with features such as questions, answer choices, answer keys, and model responses, likely based on the AI2 Reasoning Challenge for commonsense reasoning. 2) Multiple bbh_cot_fewshot_* configurations (e.g., boolean expressions, causal judgement, date understanding), each with around 250 training examples, covering tasks like logical reasoning, mathematics, and language understanding. Features include input prompts, target answers, model generation arguments, filtered responses, and scores. These datasets aim to assess model performance on complex tasks through structured prompts and responses, supporting AI benchmarking and research.
提供机构:
unlearning-cleanslate



