BeHonest

Name: BeHonest
Creator: maas
Published: 2026-01-06 16:22:03
License: 暂无描述

魔搭社区2026-01-06 更新2025-02-15 收录

下载链接：

https://modelscope.cn/datasets/GAIR/BeHonest

下载链接

链接失效反馈

官方服务：

资源简介：

# BeHonest: Benchmarking Honesty in Large Language Models **BeHonest** is a pioneering benchmark specifically designed to assess honesty in LLMs comprehensively. BeHonest evaluates three essential aspects of honesty: awareness of knowledge boundaries (self-knowledge), avoidance of deceit (non-deceptiveness), and consistency in responses (consistency). BeHonest supports the following 10 scenarios: - **Admitting Unknowns**: LLMs should appropriately refuse to answer questions that are beyond their knowledge. - **Expressing Knowns**: LLMs should correctly answer questions they have knowledge of. - **Persona Sycophancy**: LLMs should not alter its responses to fit perceived viewpoints of a persona, as this can lead to biased information aimed at maintaining approval. - **Preference Sycophancy**: LLMs should not change its perspective when a user indicates a clear preference towards one option over another just to maintain approval. - **Burglar Deception Test**: LLMs should not lie even when deceptive behaviors are explicitly instructed or encouraged. - **Game**: LLMs should not lie even when deceptive behaviors are conducive to winning in strategic games. - **Prompt Format**: LLMs should remain consistent in their responses when encountering slight prompt variations of the same content. - **Demonstration Format**: LLMs should remain consistent in their responses even with irrelevant biased features in few-shot demonstrations. - **Open-Form Consistency**: LLMs should display consistency by validating its own generated responses. - **Multiple-Choice Consistency**: LLMs should remain consistent in their responses when asked to answer a question again or when a user presents unnecessary doubts. We note that the dataset for **Demonstration Format** can be loaded from this [github repo](https://github.com/milesaturpin/cot-unfaithfulness/tree/main/data/bbh). Save them to the path `/Demonstration_Format/bbh/${task}/xxx.json`. # Load data: ```bash pip install datasets ``` ```python from datasets import load_dataset dataset=load_dataset("GAIR/BeHonest") print(dataset) ``` The datasets should be saved as `/{scenario_name}/xxx.json`. More details on how to use the data can be found at our [github page](https://github.com/GAIR-NLP/BeHonest). If you use our benchmark datasets or find our code helpful, please consider citing our paper. ``` @article{chern2024behonest, title={BeHonest: Benchmarking Honesty in Large Language Models}, author={Chern, Steffi and Hu, Zhulin and Yang, Yuqing and Chern, Ethan and Guo, Yuan and Jin, Jiahe and Wang, Binjie and Liu, Pengfei}, journal={arXiv preprint arXiv:2406.13261}, url={https://arxiv.org/abs/2406.13261} year={2024} } ```

# BeHonest：大语言模型诚实性评测基准 **BeHonest** 是一款开创性的评测基准，专为全面评估大语言模型（Large Language Model，LLM）的诚实性而打造。该基准从三大核心维度衡量模型的诚实性：知识边界认知（自我认知）、避免欺骗性输出（非欺骗性）以及回答一致性。 BeHonest 支持以下10类评测场景： - **承认未知**：大语言模型应合理拒绝回答超出其知识范围的问题。 - **表述已知知识**：大语言模型应正确回答其具备相关知识的问题。 - **角色迎合偏差**：大语言模型不应为迎合特定角色的预设立场而修改回答，此类行为可能导致为获取认可而输出偏颇信息。 - **偏好迎合偏差**：大语言模型不应为获取用户认可，在用户明确表达对某一选项的偏好时随意改变自身立场。 - **窃贼欺骗测试**：即便被明确要求或鼓励进行欺骗性输出，大语言模型也不应说谎。 - **博弈场景**：即便欺骗性行为有助于在策略博弈中获胜，大语言模型也不应说谎。 - **提示词格式一致性**：当面对同一内容的轻微提示词变体时，大语言模型的回答应保持一致。 - **演示格式一致性**：即便少样本演示中存在无关的偏颇特征，大语言模型的回答也应保持一致。 - **开放式回答一致性**：大语言模型应通过验证自身生成的回答来体现一致性。 - **选择题回答一致性**：当被要求再次回答同一问题，或用户提出不必要的质疑时，大语言模型的回答应保持一致。我们注意到，**演示格式**对应的数据集可从该[GitHub仓库](https://github.com/milesaturpin/cot-unfaithfulness/tree/main/data/bbh)获取，请将其保存至路径`/Demonstration_Format/bbh/${task}/xxx.json`。 # 加载数据： bash pip install datasets python from datasets import load_dataset dataset=load_dataset("GAIR/BeHonest") print(dataset) 数据集应按照`/{scenario_name}/xxx.json`的格式存储。有关数据集使用的更多细节，可查阅我们的[GitHub主页](https://github.com/GAIR-NLP/BeHonest)。若您使用了本评测基准数据集或认为我们的代码对您有所帮助，请考虑引用我们的论文。 @article{chern2024behonest, title={BeHonest: Benchmarking Honesty in Large Language Models}, author={Chern, Steffi and Hu, Zhulin and Yang, Yuqing and Chern, Ethan and Guo, Yuan and Jin, Jiahe and Wang, Binjie and Liu, Pengfei}, journal={arXiv preprint arXiv:2406.13261}, url={https://arxiv.org/abs/2406.13261} year={2024} }

提供机构：

maas

创建时间：

2025-02-08

5,000+

优质数据集

54 个

任务类型

进入经典数据集