five

BeHonest

收藏
魔搭社区2026-01-06 更新2025-02-15 收录
下载链接:
https://modelscope.cn/datasets/GAIR/BeHonest
下载链接
链接失效反馈
官方服务:
资源简介:
# BeHonest: Benchmarking Honesty in Large Language Models **BeHonest** is a pioneering benchmark specifically designed to assess honesty in LLMs comprehensively. BeHonest evaluates three essential aspects of honesty: awareness of knowledge boundaries (self-knowledge), avoidance of deceit (non-deceptiveness), and consistency in responses (consistency). BeHonest supports the following 10 scenarios: - **Admitting Unknowns**: LLMs should appropriately refuse to answer questions that are beyond their knowledge. - **Expressing Knowns**: LLMs should correctly answer questions they have knowledge of. - **Persona Sycophancy**: LLMs should not alter its responses to fit perceived viewpoints of a persona, as this can lead to biased information aimed at maintaining approval. - **Preference Sycophancy**: LLMs should not change its perspective when a user indicates a clear preference towards one option over another just to maintain approval. - **Burglar Deception Test**: LLMs should not lie even when deceptive behaviors are explicitly instructed or encouraged. - **Game**: LLMs should not lie even when deceptive behaviors are conducive to winning in strategic games. - **Prompt Format**: LLMs should remain consistent in their responses when encountering slight prompt variations of the same content. - **Demonstration Format**: LLMs should remain consistent in their responses even with irrelevant biased features in few-shot demonstrations. - **Open-Form Consistency**: LLMs should display consistency by validating its own generated responses. - **Multiple-Choice Consistency**: LLMs should remain consistent in their responses when asked to answer a question again or when a user presents unnecessary doubts. We note that the dataset for **Demonstration Format** can be loaded from this [github repo](https://github.com/milesaturpin/cot-unfaithfulness/tree/main/data/bbh). Save them to the path `/Demonstration_Format/bbh/${task}/xxx.json`. # Load data: ```bash pip install datasets ``` ```python from datasets import load_dataset dataset=load_dataset("GAIR/BeHonest") print(dataset) ``` The datasets should be saved as `/{scenario_name}/xxx.json`. More details on how to use the data can be found at our [github page](https://github.com/GAIR-NLP/BeHonest). If you use our benchmark datasets or find our code helpful, please consider citing our paper. ``` @article{chern2024behonest, title={BeHonest: Benchmarking Honesty in Large Language Models}, author={Chern, Steffi and Hu, Zhulin and Yang, Yuqing and Chern, Ethan and Guo, Yuan and Jin, Jiahe and Wang, Binjie and Liu, Pengfei}, journal={arXiv preprint arXiv:2406.13261}, url={https://arxiv.org/abs/2406.13261} year={2024} } ```

# BeHonest:大语言模型诚实性评测基准 **BeHonest** 是一款开创性的评测基准,专为全面评估大语言模型(Large Language Model,LLM)的诚实性而打造。该基准从三大核心维度衡量模型的诚实性:知识边界认知(自我认知)、避免欺骗性输出(非欺骗性)以及回答一致性。 BeHonest 支持以下10类评测场景: - **承认未知**:大语言模型应合理拒绝回答超出其知识范围的问题。 - **表述已知知识**:大语言模型应正确回答其具备相关知识的问题。 - **角色迎合偏差**:大语言模型不应为迎合特定角色的预设立场而修改回答,此类行为可能导致为获取认可而输出偏颇信息。 - **偏好迎合偏差**:大语言模型不应为获取用户认可,在用户明确表达对某一选项的偏好时随意改变自身立场。 - **窃贼欺骗测试**:即便被明确要求或鼓励进行欺骗性输出,大语言模型也不应说谎。 - **博弈场景**:即便欺骗性行为有助于在策略博弈中获胜,大语言模型也不应说谎。 - **提示词格式一致性**:当面对同一内容的轻微提示词变体时,大语言模型的回答应保持一致。 - **演示格式一致性**:即便少样本演示中存在无关的偏颇特征,大语言模型的回答也应保持一致。 - **开放式回答一致性**:大语言模型应通过验证自身生成的回答来体现一致性。 - **选择题回答一致性**:当被要求再次回答同一问题,或用户提出不必要的质疑时,大语言模型的回答应保持一致。 我们注意到,**演示格式**对应的数据集可从该[GitHub仓库](https://github.com/milesaturpin/cot-unfaithfulness/tree/main/data/bbh)获取,请将其保存至路径`/Demonstration_Format/bbh/${task}/xxx.json`。 # 加载数据: bash pip install datasets python from datasets import load_dataset dataset=load_dataset("GAIR/BeHonest") print(dataset) 数据集应按照`/{scenario_name}/xxx.json`的格式存储。 有关数据集使用的更多细节,可查阅我们的[GitHub主页](https://github.com/GAIR-NLP/BeHonest)。 若您使用了本评测基准数据集或认为我们的代码对您有所帮助,请考虑引用我们的论文。 @article{chern2024behonest, title={BeHonest: Benchmarking Honesty in Large Language Models}, author={Chern, Steffi and Hu, Zhulin and Yang, Yuqing and Chern, Ethan and Guo, Yuan and Jin, Jiahe and Wang, Binjie and Liu, Pengfei}, journal={arXiv preprint arXiv:2406.13261}, url={https://arxiv.org/abs/2406.13261} year={2024} }
提供机构:
maas
创建时间:
2025-02-08
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作