five

Com2

收藏
arXiv2025-06-08 更新2025-11-28 收录
下载链接:
https://github.com/Waste-Wood/Com2
下载链接
链接失效反馈
官方服务:
资源简介:
Com2数据集由哈尔滨工业大学社会计算与交互机器人研究中心提出,旨在评估大型语言模型在复杂常识推理方面的能力。该数据集基于因果事件图构建,利用因果理论生成不同的推理场景,并通过逻辑关系引导合成推理任务。数据集包含直接、决策、过渡、干预和反事实五种类型的推理任务,并额外构建了一个基于侦探故事的更难子集Com2-hard。实验表明,即使在使用了慢思考等方法后,LLMs在推理深度和广度上仍然存在挑战,但后训练和慢思考可以缓解这些问题。

The Com2 dataset was proposed by the Social Computing and Interactive Robotics Research Center of Harbin Institute of Technology, aiming to evaluate the capabilities of large language models (LLMs) in complex commonsense reasoning. Built on causal event graphs, this dataset leverages causal theories to generate diverse reasoning scenarios and synthesizes reasoning tasks guided by logical relationships. The dataset includes five types of reasoning tasks: direct, decision-making, transitional, intervention, and counterfactual, and an additional harder subset named Com2-hard based on detective stories is constructed. Experiments demonstrate that even after adopting methods like slow thinking, LLMs still face challenges in terms of reasoning depth and breadth, while post-training and slow thinking can alleviate these issues.
提供机构:
哈尔滨工业大学社会计算与交互机器人研究中心
创建时间:
2025-06-08
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作