CAiRE/belief_r
收藏Hugging Face2024-12-10 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/CAiRE/belief_r
下载链接
链接失效反馈官方服务:
资源简介:
Belief-R数据集是一个用于测试大型语言模型(LMs)在面对新证据时信念修正能力的数据集。该数据集设计灵感来源于人类如何抑制先前的推理,通过模拟需要更新先前结论的场景来评估LMs。数据集包含一系列前提,旨在模拟新信息可能要求LMs更新先前结论的情境。评估了约30个LMs,发现LMs在面对新信息时普遍难以适当修正其信念。此外,擅长更新的模型在不需要更新的情境下表现不佳,突出了关键权衡。这些发现强调了提高LMs对变化信息的适应性的重要性,这是迈向更可靠AI系统的一步。
The Belief-R dataset is designed to test the belief revision ability of large language models (LMs) when presented with new evidence. Inspired by how humans suppress prior inferences, this dataset evaluates LMs by simulating scenarios where additional information could necessitate prior conclusions drawn by LMs. The dataset features sequences of premises designed to simulate such scenarios. Approximately 30 LMs were evaluated, and it was found that LMs generally struggle to appropriately revise their beliefs in response to new information. Furthermore, models adept at updating often underperformed in scenarios without necessary updates, highlighting a critical trade-off. These insights underscore the importance of improving LMs adaptiveness to changing information, a step toward more reliable AI systems.
提供机构:
CAiRE



