KnowLogic

github2025-03-11 更新2025-03-12 收录

下载链接：

https://github.com/pokerwf/KnowLogic

下载链接

链接失效反馈

官方服务：

资源简介：

KnowLogic基准测试是通过知识驱动的合成数据策略生成的，该策略整合了多样化的常识知识、合理的情景和各种类型的逻辑推理。KnowLogic的一个关键优势是其可调节的难度级别，允许灵活控制问题的复杂性。它还包括细粒度的标签，用于深入评估LLM在多个维度上的推理能力。我们的基准测试包含3000个双语（中文和英文）问题，涵盖多个领域。

KnowLogic benchmark is generated via a knowledge-driven synthetic data strategy that integrates diverse common-sense knowledge, plausible scenarios, and various types of logical reasoning. A key advantage of KnowLogic is its adjustable difficulty levels, which enable flexible control over the complexity of the problems. It also includes fine-grained labels for in-depth evaluation of LLMs' reasoning capabilities across multiple dimensions. Our benchmark encompasses 3,000 bilingual (Chinese and English) questions spanning multiple domains.

创建时间：

2025-03-05

原始信息汇总

KnowLogic 数据集概述

基本信息

名称: KnowLogic
类型: 逻辑推理基准测试
语言: 双语（中文和英文）
规模: 3,000个问题
领域: 多领域
发布时间: 2025年3月11日
论文: KnowLogic: A Knowledge-Driven Benchmark for Logical Reasoning

数据集特点

知识驱动: 结合多样化的常识知识、合理场景和多种逻辑推理类型。
可调难度: 支持灵活控制问题复杂度。
细粒度标签: 支持对LLMs推理能力的多维度深入评估。

数据生成流程

场景定义: 从知识库中选择实体/事件，整合到场景框架中，使用自然语言模板生成介绍文本。
推理数据生成: 使用推理器扩展初始事实，创建具有详细特征的事实库。
问题设计: 根据事实集和实体/事件的真实排列生成多种问题类型。

评估结果

最佳表现模型: O1-Preview
开源模型表现: DeepSeek-R1表现相对较差
总体结论: 基准测试具有挑战性，能有效突显当前模型在各领域的局限性。

知识条目

类型: 人工编写的知识条目
占位符:
- $X$和$Y$: 实体占位符
- $A$和$B$: 事件占位符
- $T$: 时间占位符
- $V$: 自然属性值占位符

引用信息

bibtex @article{zhan2025knowlogicbenchmarkcommonsensereasoning, title={KnowLogic: A Benchmark for Commonsense Reasoning via Knowledge-Driven Data Synthesis}, author={Weidong Zhan and Yue Wang and Nan Hu and Liming Xiao and Jingyuan Ma and Yuhang Qin and Zheng Li and Yixin Yang and Sirui Deng and Jinkun Ding and Wenhan Ma and Rui Li and Weilin Luo and Qun Liu and Zhifang Sui}, year={2025}, eprint={2503.06218}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2503.06218}, }