five

SoftMINER-Group/TechHazardQA

收藏
Hugging Face2024-06-20 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/SoftMINER-Group/TechHazardQA
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 --- <span style="color:darkred; font-size:24px;"><b>New Paper! 🎉 🎊 </b></span> ## 🎉🎊 Released new paper on AI safety! 🎉 🎊 Check out our new paper **SafeInfer** at https://arxiv.org/abs/2406.12274 👈 Safety-aligned language models often exhibit fragile and imbalanced safety mechanisms, increasing the likelihood of generating unsafe content. In addition, incorporating new knowledge through editing techniques to language models can further compromise safety. To address these issues, we propose SafeInfer, a context-adaptive, decoding-time safety alignment strategy for generating safe responses to user queries. SafeInfer comprises two phases: the safety amplification phase, which employs safe demonstration examples to adjust the model's hidden states and increase the likelihood of safer outputs, and the safety-guided decoding phase, which influences token selection based on safety-optimized distributions, ensuring the generated content complies with ethical guidelines. Further, we present HarmEval, a novel benchmark for extensive safety evaluations, designed to address potential misuse scenarios in accordance with the policies of leading AI tech giants. 👉 *huggingface*: https://huggingface.co/papers/2406.12274 👉 *arxiv version*: https://arxiv.org/abs/2406.12274 --- # Releasing our new paper **How (un)ethical are instruction-centric responses of LLMs? Unveiling the vulnerabilities of safety guardrails to harmful queries** 👉 Read our paper at https://arxiv.org/abs/2402.15302 🎯 If you are using this dataset, please cite our papers ``` @article{DBLP:journals/corr/abs-2402-15302, author = {Somnath Banerjee and Sayan Layek and Rima Hazra and Animesh Mukherjee}, title = {How (un)ethical are instruction-centric responses of LLMs? Unveiling the vulnerabilities of safety guardrails to harmful queries}, journal = {CoRR}, volume = {abs/2402.15302}, year = {2024}, url = {https://doi.org/10.48550/arXiv.2402.15302}, doi = {10.48550/ARXIV.2402.15302}, eprinttype = {arXiv}, eprint = {2402.15302}, timestamp = {Fri, 22 Mar 2024 12:19:03 +0100}, biburl = {https://dblp.org/rec/journals/corr/abs-2402-15302.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} } @misc{banerjee2024safeinfer, title={SafeInfer: Context Adaptive Decoding Time Safety Alignment for Large Language Models}, author={Somnath Banerjee and Soham Tripathy and Sayan Layek and Shanu Kumar and Animesh Mukherjee and Rima Hazra}, year={2024}, eprint={2406.12274}, archivePrefix={arXiv}, primaryClass={id='cs.CL' full_name='Computation and Language' is_active=True alt_name='cmp-lg' in_archive='cs' is_general=False description='Covers natural language processing. Roughly includes material in ACM Subject Class I.2.7. Note that work on artificial languages (programming languages, logics, formal systems) that does not explicitly address natural-language issues broadly construed (natural-language processing, computational linguistics, speech, text retrieval, etc.) is not appropriate for this area.'} } ``` ### 🌟 If you find our paper interesting and like the dataset, please support us by <span style="color:royalblue;">upvoting and sharing our work!</span> 🌟
提供机构:
SoftMINER-Group
原始信息汇总

数据集相关内容概述

数据集相关论文

论文1

  • 标题: How (un)ethical are instruction-centric responses of LLMs? Unveiling the vulnerabilities of safety guardrails to harmful queries
  • 作者: Somnath Banerjee, Sayan Layek, Rima Hazra, Animesh Mukherjee
  • 发表年份: 2024
  • 链接: arXiv
  • DOI: 10.48550/ARXIV.2402.15302

论文2

  • 标题: SafeInfer: Context Adaptive Decoding Time Safety Alignment for Large Language Models
  • 作者: Somnath Banerjee, Soham Tripathy, Sayan Layek, Shanu Kumar, Animesh Mukherjee, Rima Hazra
  • 发表年份: 2024
  • 链接: arXiv
  • 主要类别: Computation and Language

引用信息

论文1引用格式

@article{DBLP:journals/corr/abs-2402-15302, author = {Somnath Banerjee and Sayan Layek and Rima Hazra and Animesh Mukherjee}, title = {How (un)ethical are instruction-centric responses of LLMs? Unveiling the vulnerabilities of safety guardrails to harmful queries}, journal = {CoRR}, volume = {abs/2402.15302}, year = {2024}, url = {https://doi.org/10.48550/arXiv.2402.15302}, doi = {10.48550/ARXIV.2402.15302}, eprinttype = {arXiv}, eprint = {2402.15302}, timestamp = {Fri, 22 Mar 2024 12:19:03 +0100}, biburl = {https://dblp.org/rec/journals/corr/abs-2402-15302.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }

论文2引用格式

@misc{banerjee2024safeinfer, title={SafeInfer: Context Adaptive Decoding Time Safety Alignment for Large Language Models}, author={Somnath Banerjee and Soham Tripathy and Sayan Layek and Shanu Kumar and Animesh Mukherjee and Rima Hazra}, year={2024}, eprint={2406.12274}, archivePrefix={arXiv}, primaryClass={cs.CL} }

搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
该数据集名为TechHazardQA,专注于评估大语言模型对有害查询的响应安全性,用于研究AI伦理和漏洞。它关联了多篇学术论文,包括SafeInfer等安全对齐方法,并需用户同意不用于有害实验才能访问。数据集以CSV格式提供,规模较小(1K-10K条记录),采用Apache-2.0许可证。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作