Com2
收藏arXiv2025-06-08 更新2025-11-28 收录
下载链接:
https://github.com/Waste-Wood/Com2
下载链接
链接失效反馈官方服务:
资源简介:
Com2数据集由哈尔滨工业大学社会计算与交互机器人研究中心提出,旨在评估大型语言模型在复杂常识推理方面的能力。该数据集基于因果事件图构建,利用因果理论生成不同的推理场景,并通过逻辑关系引导合成推理任务。数据集包含直接、决策、过渡、干预和反事实五种类型的推理任务,并额外构建了一个基于侦探故事的更难子集Com2-hard。实验表明,即使在使用了慢思考等方法后,LLMs在推理深度和广度上仍然存在挑战,但后训练和慢思考可以缓解这些问题。
The Com2 dataset was proposed by the Social Computing and Interactive Robotics Research Center of Harbin Institute of Technology, aiming to evaluate the capabilities of large language models (LLMs) in complex commonsense reasoning. Built on causal event graphs, this dataset leverages causal theories to generate diverse reasoning scenarios and synthesizes reasoning tasks guided by logical relationships. The dataset includes five types of reasoning tasks: direct, decision-making, transitional, intervention, and counterfactual, and an additional harder subset named Com2-hard based on detective stories is constructed. Experiments demonstrate that even after adopting methods like slow thinking, LLMs still face challenges in terms of reasoning depth and breadth, while post-training and slow thinking can alleviate these issues.
提供机构:
哈尔滨工业大学社会计算与交互机器人研究中心
创建时间:
2025-06-08



