CausalBench
收藏arXiv2024-04-09 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/2404.06349v1
下载链接
链接失效反馈官方服务:
资源简介:
CausalBench是一个专为评估大型语言模型因果学习能力设计的全面基准。该数据集由15个来自因果研究社区的常用真实世界数据集组成,涵盖了从2到109个节点的因果网络,旨在探索LLMs在不同难度任务场景下的能力上限。CausalBench不仅包含因果网络,还整合了背景知识和结构化数据,以彻底解锁LLMs在长文本理解和先验信息利用方面的潜力。通过CausalBench,可以对19个领先的LLMs进行评估,并揭示它们在理解因果关系方面的优势和劣势。
CausalBench is a comprehensive benchmark specifically designed for evaluating the causal learning capabilities of large language models (LLMs). This dataset comprises 15 widely used real-world datasets sourced from the causal research community, covering causal networks ranging from 2 to 109 nodes, aiming to explore the upper limits of LLMs' performance across task scenarios of varying difficulty. In addition to including causal networks, CausalBench also integrates background knowledge and structured data, thereby fully unlocking the potential of LLMs in long-text comprehension and prior information utilization. CausalBench enables the evaluation of 19 state-of-the-art LLMs, revealing their respective strengths and weaknesses in understanding causal relationships.
提供机构:
香港理工大学计算机系
创建时间:
2024-04-09



