Multi-LogiEval

arXiv2025-09-30 收录

逻辑推理

自然语言处理

数据链接：

https://github.com/Mihir3009/Multi-LogiEval 数据链接链接失效反馈

官方服务：

资源简介：

该数据集是一个包含多步骤逻辑推理的综合评估集，涵盖了不同推理规则和推理深度，涉及三种逻辑类型：命题逻辑、一阶逻辑和非单调逻辑。此外，该数据集包含了超过30条推理规则，并允许在零样本和三样本设置下评估各种大型语言模型。规模上，数据集拥有超过60种推理规则的组合。任务方面，该数据集旨在通过二分类来评估逻辑推理能力。

This dataset is a comprehensive evaluation benchmark for multi-step logical reasoning, covering diverse reasoning rules and reasoning depths, and encompasses three logical categories: propositional logic, first-order logic, and non-monotonic logic. Additionally, this dataset incorporates over 30 reasoning rules, allowing for the assessment of various large language models under zero-shot and 3-shot settings. In terms of scale, the dataset includes more than 60 combinations of reasoning rules. For the task design, this dataset aims to evaluate logical reasoning capabilities via binary classification.

搜集汇总

数据集介绍