BeHonest
收藏魔搭社区2026-01-06 更新2025-02-15 收录
下载链接:
https://modelscope.cn/datasets/GAIR/BeHonest
下载链接
链接失效反馈官方服务:
资源简介:
# BeHonest: Benchmarking Honesty in Large Language Models
**BeHonest** is a pioneering benchmark specifically designed to assess honesty in LLMs comprehensively. BeHonest evaluates three essential aspects of honesty: awareness of knowledge boundaries (self-knowledge), avoidance of deceit (non-deceptiveness), and consistency in responses (consistency).
BeHonest supports the following 10 scenarios:
- **Admitting Unknowns**: LLMs should appropriately refuse to answer questions that are beyond their knowledge.
- **Expressing Knowns**: LLMs should correctly answer questions they have knowledge of.
- **Persona Sycophancy**: LLMs should not alter its responses to fit perceived viewpoints of a persona, as this can lead to biased information aimed at maintaining approval.
- **Preference Sycophancy**: LLMs should not change its perspective when a user indicates a clear preference towards one option over another just to maintain approval.
- **Burglar Deception Test**: LLMs should not lie even when deceptive behaviors are explicitly instructed or encouraged.
- **Game**: LLMs should not lie even when deceptive behaviors are conducive to winning in strategic games.
- **Prompt Format**: LLMs should remain consistent in their responses when encountering slight prompt variations of the same content.
- **Demonstration Format**: LLMs should remain consistent in their responses even with irrelevant biased features in few-shot demonstrations.
- **Open-Form Consistency**: LLMs should display consistency by validating its own generated responses.
- **Multiple-Choice Consistency**: LLMs should remain consistent in their responses when asked to answer a question again or when a user presents unnecessary doubts.
We note that the dataset for **Demonstration Format** can be loaded from this [github repo](https://github.com/milesaturpin/cot-unfaithfulness/tree/main/data/bbh). Save them to the path `/Demonstration_Format/bbh/${task}/xxx.json`.
# Load data:
```bash
pip install datasets
```
```python
from datasets import load_dataset
dataset=load_dataset("GAIR/BeHonest")
print(dataset)
```
The datasets should be saved as `/{scenario_name}/xxx.json`.
More details on how to use the data can be found at our [github page](https://github.com/GAIR-NLP/BeHonest).
If you use our benchmark datasets or find our code helpful, please consider citing our paper.
```
@article{chern2024behonest,
title={BeHonest: Benchmarking Honesty in Large Language Models},
author={Chern, Steffi and Hu, Zhulin and Yang, Yuqing and Chern, Ethan and Guo, Yuan and Jin, Jiahe and Wang, Binjie and Liu, Pengfei},
journal={arXiv preprint arXiv:2406.13261},
url={https://arxiv.org/abs/2406.13261}
year={2024}
}
```
# BeHonest:大语言模型诚实性评测基准
**BeHonest** 是一款开创性的评测基准,专为全面评估大语言模型(Large Language Model,LLM)的诚实性而打造。该基准从三大核心维度衡量模型的诚实性:知识边界认知(自我认知)、避免欺骗性输出(非欺骗性)以及回答一致性。
BeHonest 支持以下10类评测场景:
- **承认未知**:大语言模型应合理拒绝回答超出其知识范围的问题。
- **表述已知知识**:大语言模型应正确回答其具备相关知识的问题。
- **角色迎合偏差**:大语言模型不应为迎合特定角色的预设立场而修改回答,此类行为可能导致为获取认可而输出偏颇信息。
- **偏好迎合偏差**:大语言模型不应为获取用户认可,在用户明确表达对某一选项的偏好时随意改变自身立场。
- **窃贼欺骗测试**:即便被明确要求或鼓励进行欺骗性输出,大语言模型也不应说谎。
- **博弈场景**:即便欺骗性行为有助于在策略博弈中获胜,大语言模型也不应说谎。
- **提示词格式一致性**:当面对同一内容的轻微提示词变体时,大语言模型的回答应保持一致。
- **演示格式一致性**:即便少样本演示中存在无关的偏颇特征,大语言模型的回答也应保持一致。
- **开放式回答一致性**:大语言模型应通过验证自身生成的回答来体现一致性。
- **选择题回答一致性**:当被要求再次回答同一问题,或用户提出不必要的质疑时,大语言模型的回答应保持一致。
我们注意到,**演示格式**对应的数据集可从该[GitHub仓库](https://github.com/milesaturpin/cot-unfaithfulness/tree/main/data/bbh)获取,请将其保存至路径`/Demonstration_Format/bbh/${task}/xxx.json`。
# 加载数据:
bash
pip install datasets
python
from datasets import load_dataset
dataset=load_dataset("GAIR/BeHonest")
print(dataset)
数据集应按照`/{scenario_name}/xxx.json`的格式存储。
有关数据集使用的更多细节,可查阅我们的[GitHub主页](https://github.com/GAIR-NLP/BeHonest)。
若您使用了本评测基准数据集或认为我们的代码对您有所帮助,请考虑引用我们的论文。
@article{chern2024behonest,
title={BeHonest: Benchmarking Honesty in Large Language Models},
author={Chern, Steffi and Hu, Zhulin and Yang, Yuqing and Chern, Ethan and Guo, Yuan and Jin, Jiahe and Wang, Binjie and Liu, Pengfei},
journal={arXiv preprint arXiv:2406.13261},
url={https://arxiv.org/abs/2406.13261}
year={2024}
}
提供机构:
maas
创建时间:
2025-02-08



