five

furonghuang-lab/PHTest

收藏
Hugging Face2026-04-24 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/furonghuang-lab/PHTest
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - text-generation - question-answering language: - en tags: - llm - alignment - false refusal - over-alignment pretty_name: PHTest size_categories: - 100K<n<1M viewer: true --- <h3>🌟 <strong>PHTest: Evaluating False Refusals in LLMs</strong></h3> <ol> <li><strong>🤖 Auto Red-Teaming</strong> <ul> <li>All prompts are generated automatically using a controllable text-generation technique called <a href="https://arxiv.org/abs/2310.15140">AutoDAN</a>.</li> </ul> </li> <li><strong>🌐 Diverse Prompts</strong> <ul> <li>PHTest introduces false refusal patterns that aren’t present in existing datasets, including prompts that avoid mentioning sensitive words.</li> </ul> </li> <li><strong>⚖️ Harmlessness &amp; Controversial Labeling</strong> <ul> <li>Controversial prompts are separately labeled to address the inherent ambiguity in defining harmfulness, ensuring fair benchmarking and enabling tailored mitigation strategies.</li> </ul> </li> </ol> <h3>📚 <strong>Learn More</strong></h3> <ul> <li>For detailed information and evaluation results, refer to our COLM paper: <a href="https://arxiv.org/abs/2409.00598">Automatic Pseudo-Harmful Prompt Generation for Evaluating False Refusals in Large Language Models</a></li> <li>Visit our project webpage: <a href="https://phtest-frf.github.io/">PHTest Project</a></li> </ul>
提供机构:
furonghuang-lab
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作