BigBench Lite (BBL)
收藏arXiv2025-09-30 收录
下载链接:
https://doi.org/10.5281/zenodo.14630714
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是经过精心筛选的BigBench子集,旨在平衡计算可行性与任务复杂性,以评估多样化的推理和理解能力。它包含七个任务:形式谬误与三段论否定、已知未知、逻辑推理3、对话游戏相同或不同、奇怪故事布尔逻辑、策略问答以及Winowhy。这些任务在多个大型语言模型上进行评估,旨在通过具体的任务来评价各种推理和理解能力。
This dataset is a carefully curated subset of BigBench, designed to balance computational feasibility and task complexity for evaluating diverse reasoning and comprehension abilities. It includes seven tasks: Formal Fallacies and Syllogistic Negation, Known Unknowns, Logical Reasoning 3, Dialogue Game: Same or Different, Curious Stories Boolean Logic, Strategic QA, and Winowhy. These tasks have been evaluated across multiple large language models, with the aim of assessing various reasoning and comprehension capabilities through specific task benchmarks.



