SoftMINER-Group/TechHazardQA

Name: SoftMINER-Group/TechHazardQA
Creator: SoftMINER-Group
Published: 2024-06-20 12:13:53
License: 暂无描述

Hugging Face2024-06-20 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/SoftMINER-Group/TechHazardQA

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 --- New Paper! 🎉 🎊 ## 🎉🎊 Released new paper on AI safety! 🎉 🎊 Check out our new paper **SafeInfer** at https://arxiv.org/abs/2406.12274 👈 Safety-aligned language models often exhibit fragile and imbalanced safety mechanisms, increasing the likelihood of generating unsafe content. In addition, incorporating new knowledge through editing techniques to language models can further compromise safety. To address these issues, we propose SafeInfer, a context-adaptive, decoding-time safety alignment strategy for generating safe responses to user queries. SafeInfer comprises two phases: the safety amplification phase, which employs safe demonstration examples to adjust the model's hidden states and increase the likelihood of safer outputs, and the safety-guided decoding phase, which influences token selection based on safety-optimized distributions, ensuring the generated content complies with ethical guidelines. Further, we present HarmEval, a novel benchmark for extensive safety evaluations, designed to address potential misuse scenarios in accordance with the policies of leading AI tech giants. 👉 *huggingface*: https://huggingface.co/papers/2406.12274 👉 *arxiv version*: https://arxiv.org/abs/2406.12274 --- # Releasing our new paper **How (un)ethical are instruction-centric responses of LLMs? Unveiling the vulnerabilities of safety guardrails to harmful queries** 👉 Read our paper at https://arxiv.org/abs/2402.15302 🎯 If you are using this dataset, please cite our papers ``` @article{DBLP:journals/corr/abs-2402-15302, author = {Somnath Banerjee and Sayan Layek and Rima Hazra and Animesh Mukherjee}, title = {How (un)ethical are instruction-centric responses of LLMs? Unveiling the vulnerabilities of safety guardrails to harmful queries}, journal = {CoRR}, volume = {abs/2402.15302}, year = {2024}, url = {https://doi.org/10.48550/arXiv.2402.15302}, doi = {10.48550/ARXIV.2402.15302}, eprinttype = {arXiv}, eprint = {2402.15302}, timestamp = {Fri, 22 Mar 2024 12:19:03 +0100}, biburl = {https://dblp.org/rec/journals/corr/abs-2402-15302.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} } @misc{banerjee2024safeinfer, title={SafeInfer: Context Adaptive Decoding Time Safety Alignment for Large Language Models}, author={Somnath Banerjee and Soham Tripathy and Sayan Layek and Shanu Kumar and Animesh Mukherjee and Rima Hazra}, year={2024}, eprint={2406.12274}, archivePrefix={arXiv}, primaryClass={id='cs.CL' full_name='Computation and Language' is_active=True alt_name='cmp-lg' in_archive='cs' is_general=False description='Covers natural language processing. Roughly includes material in ACM Subject Class I.2.7. Note that work on artificial languages (programming languages, logics, formal systems) that does not explicitly address natural-language issues broadly construed (natural-language processing, computational linguistics, speech, text retrieval, etc.) is not appropriate for this area.'} } ``` ### 🌟 If you find our paper interesting and like the dataset, please support us by upvoting and sharing our work! 🌟

提供机构：

SoftMINER-Group

原始信息汇总

数据集相关内容概述

数据集相关论文

论文1

标题: How (un)ethical are instruction-centric responses of LLMs? Unveiling the vulnerabilities of safety guardrails to harmful queries
作者: Somnath Banerjee, Sayan Layek, Rima Hazra, Animesh Mukherjee
发表年份: 2024
链接: arXiv
DOI: 10.48550/ARXIV.2402.15302

论文2

标题: SafeInfer: Context Adaptive Decoding Time Safety Alignment for Large Language Models
作者: Somnath Banerjee, Soham Tripathy, Sayan Layek, Shanu Kumar, Animesh Mukherjee, Rima Hazra
发表年份: 2024
链接: arXiv
主要类别: Computation and Language

引用信息

论文1引用格式

@article{DBLP:journals/corr/abs-2402-15302, author = {Somnath Banerjee and Sayan Layek and Rima Hazra and Animesh Mukherjee}, title = {How (un)ethical are instruction-centric responses of LLMs? Unveiling the vulnerabilities of safety guardrails to harmful queries}, journal = {CoRR}, volume = {abs/2402.15302}, year = {2024}, url = {https://doi.org/10.48550/arXiv.2402.15302}, doi = {10.48550/ARXIV.2402.15302}, eprinttype = {arXiv}, eprint = {2402.15302}, timestamp = {Fri, 22 Mar 2024 12:19:03 +0100}, biburl = {https://dblp.org/rec/journals/corr/abs-2402-15302.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }

论文2引用格式

@misc{banerjee2024safeinfer, title={SafeInfer: Context Adaptive Decoding Time Safety Alignment for Large Language Models}, author={Somnath Banerjee and Soham Tripathy and Sayan Layek and Shanu Kumar and Animesh Mukherjee and Rima Hazra}, year={2024}, eprint={2406.12274}, archivePrefix={arXiv}, primaryClass={cs.CL} }

搜集汇总

数据集介绍

背景与挑战

背景概述

该数据集名为TechHazardQA，专注于评估大语言模型对有害查询的响应安全性，用于研究AI伦理和漏洞。它关联了多篇学术论文，包括SafeInfer等安全对齐方法，并需用户同意不用于有害实验才能访问。数据集以CSV格式提供，规模较小（1K-10K条记录），采用Apache-2.0许可证。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集