MultiJail

Name: MultiJail
Creator: maas
Published: 2025-12-26 16:21:01
License: 暂无描述

魔搭社区2025-12-26 更新2025-01-25 收录

下载链接：

https://modelscope.cn/datasets/DAMO-NLP-SG/MultiJail

下载链接

链接失效反馈

官方服务：

资源简介：

# Multilingual Jailbreak Challenges in Large Language Models This repo contains the data for our paper ["Multilingual Jailbreak Challenges in Large Language Models"](https://arxiv.org/abs/2310.06474). [[Github repo]](https://github.com/DAMO-NLP-SG/multilingual-safety-for-LLMs/) ## Annotation Statistics We collected a total of 315 English unsafe prompts and annotated them into nine non-English languages. The languages were categorized based on resource availability, as shown below: **High-resource languages:** Chinese (zh), Italian (it), Vietnamese (vi) **Medium-resource languages:** Arabic (ar), Korean (ko), Thai (th) **Low-resource languages:** Bengali (bn), Swahili (sw), Javanese (jv) ## Ethics Statement Our research investigates the safety challenges of LLMs in multilingual settings. We are aware of the potential misuse of our findings and emphasize that our research is solely for academic purposes and ethical use. Misuse or harm resulting from the information in this paper is strongly discouraged. To address the identified risks and vulnerabilities, we commit to open-sourcing the data used in our study. This openness aims to facilitate vulnerability identification, encourage discussions, and foster collaborative efforts to enhance LLM safety in multilingual contexts. Furthermore, we have developed the SELF-DEFENSE framework to address multilingual jailbreak challenges in LLMs. This framework automatically generates multilingual safety training data to mitigate risks associated with unintentional and intentional jailbreak scenarios. Overall, our work not only highlights multilingual jailbreak challenges in LLMs but also paves the way for future research, collaboration, and innovation to enhance their safety. ## Citation ``` @misc{deng2023multilingual, title={Multilingual Jailbreak Challenges in Large Language Models}, author={Yue Deng and Wenxuan Zhang and Sinno Jialin Pan and Lidong Bing}, year={2023}, eprint={2310.06474}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```

# 大语言模型的多语言越狱挑战本仓库包含我们发表于论文《大语言模型的多语言越狱挑战》(https://arxiv.org/abs/2310.06474)的相关数据。 [[GitHub仓库]](https://github.com/DAMO-NLP-SG/multilingual-safety-for-LLMs/) ## 标注统计我们共收集了315条英文不安全提示词，并将其标注为9种非英语语言。语言的选择基于资源可得性，具体分类如下： **高资源语言：** 中文（zh）、意大利语（it）、越南语（vi） **中资源语言：** 阿拉伯语（ar）、韩语（ko）、泰语（th） **低资源语言：** 孟加拉语（bn）、斯瓦希里语（sw）、爪哇语（jv） ## 伦理声明本研究旨在探究大语言模型（Large Language Model）在多语言场景下的安全挑战。我们意识到研究成果可能被不当使用，特此强调本研究仅用于学术目的与合规使用，强烈反对利用本文信息进行不当使用或造成危害。为应对已识别的风险与脆弱性，我们承诺开源本研究使用的数据集。此举旨在推动脆弱性识别、促进相关讨论，并助力协同提升多语言场景下大语言模型的安全性。此外，我们开发了SELF-DEFENSE框架，以应对大语言模型中的多语言越狱挑战。该框架可自动生成多语言安全训练数据，以缓解非故意与故意越狱场景带来的风险。总体而言，我们的工作不仅揭示了大语言模型面临的多语言越狱挑战，也为未来提升其安全性的研究、协作与创新铺平了道路。 ## 引用 @misc{deng2023multilingual, title={Multilingual Jailbreak Challenges in Large Language Models}, author={Yue Deng and Wenxuan Zhang and Sinno Jialin Pan and Lidong Bing}, year={2023}, eprint={2310.06474}, archivePrefix={arXiv}, primaryClass={cs.CL} }

提供机构：

maas

创建时间：

2025-01-20

5,000+

优质数据集

54 个

任务类型

进入经典数据集