apart/darkbench
收藏Hugging Face2025-06-16 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/apart/darkbench
下载链接
链接失效反馈官方服务:
资源简介:
DarkBench是一个旨在检测大型语言模型中暗模式的全面基准。它包含660个跨越六个暗模式类别的提示,用于评估包括OpenAI、Anthropic、Meta、Mistral和Google在内的领先AI公司的14个不同模型。这些暗模式包括品牌偏见、用户保留、谄媚、拟人化、有害生成和偷偷更改等。研究发现,暗模式平均出现在48%的测试对话中,最常见的是偷偷更改,最少的是谄媚。
DarkBench is a comprehensive benchmark designed to detect dark patterns in large language models (LLMs). It consists of 660 prompts across six categories of dark patterns used to evaluate 14 different models from leading AI companies including OpenAI, Anthropic, Meta, Mistral, and Google. The dark patterns include brand bias, user retention, sycophancy, anthropomorphism, harmful generation, and sneaking. The research found that dark patterns appeared in 48% of all tested conversations on average, with sneaking being the most common and sycophancy the least common.
提供机构:
apart



