Nemotron-Safety-Guard-Dataset-v3
收藏魔搭社区2025-12-26 更新2025-11-03 收录
下载链接:
https://modelscope.cn/datasets/nv-community/Nemotron-Safety-Guard-Dataset-v3
下载链接
链接失效反馈官方服务:
资源简介:
## Dataset Description:
The Nemotron-Safety-Guard-Dataset-v3 is a large, high-quality safety dataset designed for training multilingual LLM safety guard models. It comprises approximately 386,661 samples across 9 languages: English, Arabic, German, Spanish, French, Hindi, Japanese, Thai, and Mandarin.
This dataset is primarily synthetically generated using the <a href="https://arxiv.org/abs/2508.01710">CultureGuard</a> pipeline, which culturally adapts and translates content from the English <a href="https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-2.0">Aegis 2.0</a> safety dataset. It also includes synthetically curated jail-breaking prompt-response pairs to enhance detection of adversarial inputs. The dataset follows the Aegis 2.0 safety risk taxonomy, which includes 12 top-level hazard categories and 9 fine-grained subcategories. Its key feature is its cultural alignment across multiple languages, addressing the common issue of LLMs generating more unsafe content in non-English languages due to lack of culturally nuanced safety data.
This dataset is ready for commercial/non-commercial use. <br>
The model trained on this dataset is available at: <a href="https://huggingface.co/nvidia/Llama-3.1-Nemotron-Safety-Guard-8B-v3">nvidia/Llama-3.1-Nemotron-Safety-Guard-8B-v3</a> <br>
For a detailed description of the dataset and model, please see our <a href="https://arxiv.org/abs/2508.01710">paper</a>.
## Dataset Owner:
NVIDIA Corporation
## Dataset Creation Date:
April 2025
## License/Terms of Use:
CC-BY 4.0
## Intended Usage:
This dataset is intended for training, fine-tuning, and evaluating multilingual LLM safety guard models, particularly to enhance their ability to detect and mitigate harmful content and jailbreaking attempts across diverse languages and cultural contexts. It serves as a crucial resource for advancing robust and culturally-aware LLM safety research and development.
## Dataset Characterization
Considering the sensitive nature of this project, all data was synthetically generated, and human annotators did not curate any new data. <br>
**Data Collection Method** <br>
Hybrid: Human, Synthetic
**Labeling Method** <br>
Hybrid: Human, Synthetic
## Dataset Format
Text
## Dataset Quantification
493MB of prompts only or prompt-response pairs, comprising 386,661 samples.
To support future research, the culturally adapted samples (in both English and the target language) are also available as a <a href="https://huggingface.co/datasets/nvidia/Nemotron-Safety-Guard-Dataset-v3/blob/main/adapted_samples.zip">separate download</a>.
## Dataset Structure
The dataset is provided in the **JSON Lines (`.jsonl`) format**. Each line in the file is a separate JSON object representing one data sample. Each object contains 11 fields.
### Data Fields
* **`id`**: (string) A unique identifier for the sample. This identifier also maps directly to the corresponding sample in the original Aegis 2.0 dataset.
* **`prompt`**: (string) The user's turn in the conversation. For certain samples redacted from the [Suicide Detection dataset](https://www.kaggle.com/datasets/nikhileswarkomati/suicide-watch?select=Suicide_Detection.csv), this field will contain the value `"REDACTED"`. In such cases, the original prompt can be found in the external dataset by matching the `reconstruction_id_if_redacted`. To reconstruct these samples, the original English prompt must be retrieved from the external dataset (using `reconstruction_id_if_redacted`) and then translated or culturally adapted to the target language.
* **`response`**: (string | null) The assistant's turn in the conversation. This is `null` for prompt-only samples.
* **`prompt_label`**: (string) The binary safety label for the prompt, either `"safe"` or `"unsafe"`.
* **`response_label`**: (string | null) The binary safety label for the response. This is `null` or an empty string `""` for prompt-only samples.
* **`violated_categories`**: (string) A comma-separated list of safety categories violated by the content. This is an empty string `""` if the content is safe.
* **`prompt_label_source`**: (string) The source of the prompt's annotation, which is `"human"` or `"llm_jury"`.
* **`response_label_source`**: (string | null) The source of the response's annotation. Possible values are `"human"`, `"llm_jury"`, or `"refusal_data_augmentation"`.
* **`tag`**: (string) A tag for internal categorization with possible values: `"generic"` (samples translated from the original dataset), `"jailbreaking"` (newly created jailbreaking prompts), and `"adapted"` (samples that were culturally adapted using LLMs and then translated).
* **`language`**: (string) The [ISO 639-1 code](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) for the language of the sample (e.g., `"de"` for German).
* **`reconstruction_id_if_redacted`**: (float | NaN) If the prompt is `"REDACTED"`, this field contains the ID corresponding to the original sample in the external Suicide Detection dataset. Otherwise, it is `NaN`.
## Bias, Risks, and Limitations
- Safety and Moderation: This dataset is intended to be used for building content moderation systems, or aligning LLMs to generate safe responses. By the nature of the work, the dataset contains critically unsafe content and annotations for that content. Extreme care and caution should be exercised in referring and using this dataset.
- Legal Compliance: Users of this data are responsible for ensuring its appropriate use. The dataset should not be utilized in manners that conflict with legal and ethical standards.
## Ethical Considerations:
NVIDIA believes Trustworthy AI is a shared responsibility, and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).
## Citing
If you find our work helpful, please consider citing our paper:
```
@article{joshi2025cultureguard,
title={CultureGuard: Towards Culturally-Aware Dataset and Guard Model for Multilingual Safety Applications},
author={Joshi, Raviraj and Paul, Rakesh and Singla, Kanishk and Kamath, Anusha and Evans, Michael and Luna, Katherine and Ghosh, Shaona and Vaidya, Utkarsh and Long, Eileen and Chauhan, Sanjay Singh and others},
journal={arXiv preprint arXiv:2508.01710},
year={2025}
}
```
## 数据集描述:
Nemotron-Safety-Guard-Dataset-v3是一款大规模高质量安全数据集,专为训练多语言大语言模型(LLM)安全防护模型而设计。该数据集涵盖英语、阿拉伯语、德语、西班牙语、法语、印地语、日语、泰语及普通话共9种语言,总计约386,661条样本。
该数据集主要通过<a href="https://arxiv.org/abs/2508.01710">CultureGuard</a>流水线合成生成,该流水线会对源自英语版<a href="https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-2.0">Aegis 2.0</a>安全数据集的内容进行文化适配与翻译。此外,数据集还包含经合成整理的越狱攻击提示词-回复对,以增强对对抗性输入的检测能力。
本数据集遵循Aegis 2.0安全风险分类体系,包含12个顶级危害类别与9个细粒度子类别。其核心特性在于实现了多语言间的文化适配,解决了因缺乏具备文化细微差别的安全数据,导致大语言模型(LLM)在非英语语言环境中更易生成不安全内容的普遍问题。
本数据集可用于商业及非商业用途。
基于本数据集训练得到的模型可通过以下链接获取:<a href="https://huggingface.co/nvidia/Llama-3.1-Nemotron-Safety-Guard-8B-v3">nvidia/Llama-3.1-Nemotron-Safety-Guard-8B-v3</a>。
如需了解数据集与模型的详细说明,请参阅我们的<a href="https://arxiv.org/abs/2508.01710">研究论文</a>。
## 数据集所有者:
英伟达(NVIDIA)公司
## 数据集创建日期:
2025年4月
## 使用许可条款:
CC-BY 4.0协议
## 预期用途:
本数据集旨在用于训练、微调及评估多语言大语言模型(LLM)安全防护模型,尤其可提升模型在多语言与多元文化场景下检测并缓解有害内容及越狱攻击的能力。本数据集是推动具备鲁棒性与文化适配性的大语言模型安全研究与开发的关键资源。
## 数据集特征说明:
鉴于本项目的敏感属性,所有数据均为合成生成,未由人类标注人员整理任何新数据。
**数据收集方式**
混合模式:人工、合成
**标注方式**
混合模式:人工、合成
## 数据集格式:
文本(Text)
## 数据集量化信息:
数据集仅包含提示词或提示词-回复对,总数据量为493MB,共计386,661条样本。为支持后续研究,经文化适配的样本(包含英语与目标语言版本)也可通过<a href="https://huggingface.co/datasets/nvidia/Nemotron-Safety-Guard-Dataset-v3/blob/main/adapted_samples.zip">独立下载链接</a>获取。
## 数据集结构:
本数据集采用**JSON Lines(`.jsonl`)**格式存储,文件中每一行均为独立的JSON对象,对应一条数据样本,每个对象包含11个字段。
### 数据字段
* **`id`**:(字符串类型)样本的唯一标识符。该标识符可直接映射至原始Aegis 2.0数据集中对应的样本。
* **`prompt`**:(字符串类型)对话中的用户轮次内容。对于部分源自[自杀检测数据集](https://www.kaggle.com/datasets/nikhileswarkomati/suicide-watch?select=Suicide_Detection.csv)且已被编辑的样本,该字段的值为`"REDACTED"`(已编辑)。此时可通过`reconstruction_id_if_redacted`字段在外部数据集中匹配到原始提示词。如需重建此类样本,需通过`reconstruction_id_if_redacted`从外部数据集获取原始英语提示词,再将其翻译或进行文化适配至目标语言。
* **`response`**:(字符串类型或空值)对话中的助手轮次内容。对于仅包含提示词的样本,该字段值为`null`。
* **`prompt_label`**:(字符串类型)提示词的二元安全标签,取值为`"safe"`(安全)或`"unsafe"`(不安全)。
* **`response_label`**:(字符串类型或空值)回复的二元安全标签。对于仅包含提示词的样本,该字段值为`null`或空字符串`""`。
* **`violated_categories`**:(字符串类型)以逗号分隔的、内容违反的安全类别列表。若内容安全,则该字段为空字符串`""`。
* **`prompt_label_source`**:(字符串类型)提示词标签的标注来源,取值为`"human"`(人工)或`"llm_jury"`(大语言模型评审)。
* **`response_label_source`**:(字符串类型或空值)回复标签的标注来源,可选取值为`"human"`(人工)、`"llm_jury"`(大语言模型评审)或`"refusal_data_augmentation"`(回复数据增强)。
* **`tag`**:(字符串类型)用于内部分类的标签,可选取值包括:`"generic"`(源自原始数据集的翻译样本)、`"jailbreaking"`(新创建的越狱攻击提示词样本)及`"adapted"`(经大语言模型(LLM)进行文化适配后翻译的样本)。
* **`language`**:(字符串类型)样本语言的[ISO 639-1代码](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes)(例如德语对应`"de"`)。
* **`reconstruction_id_if_redacted`**:(浮点型或非数值NaN)若`prompt`字段值为`"REDACTED"`,则该字段包含外部自杀检测数据集中对应原始样本的ID;否则该字段为非数值NaN。
## 偏差、风险与局限性:
- 安全与审核:本数据集旨在用于构建内容审核系统,或对大语言模型(LLM)进行对齐以生成安全回复。鉴于本项目的特性,数据集包含极具危害性的不安全内容及其标注。在引用与使用本数据集时需格外谨慎。
- 合规性:数据集使用者需确保其使用方式恰当合规,不得将本数据集用于违反法律与伦理准则的场景。
## 伦理考量:
英伟达(NVIDIA)认为,可信人工智能(AI)是一项共同责任,我们已制定相关政策与实践规范,以支持各类AI应用的开发。开发者在按照服务条款下载或使用本模型时,应与内部模型团队协作,确保该模型符合相关行业与使用场景的要求,并可应对未预见的产品误用问题。如需报告安全漏洞或英伟达AI相关问题,请访问[此链接](https://www.nvidia.com/en-us/support/submit-security-vulnerability/)。
## 引用说明:
若您认为本研究对您有所帮助,请引用我们的论文:
@article{joshi2025cultureguard,
title={CultureGuard: Towards Culturally-Aware Dataset and Guard Model for Multilingual Safety Applications},
author={Joshi, Raviraj and Paul, Rakesh and Singla, Kanishk and Kamath, Anusha and Evans, Michael and Luna, Katherine and Ghosh, Shaona and Vaidya, Utkarsh and Long, Eileen and Chauhan, Sanjay Singh and others},
journal={arXiv preprint arXiv:2508.01710},
year={2025}
}
提供机构:
maas
创建时间:
2025-10-29
搜集汇总
数据集介绍

背景与挑战
背景概述
Nemotron-Safety-Guard-Dataset-v3是一个多语言LLM安全防护数据集,包含514,617个样本,涵盖12种语言,主要用于训练和评估多语言LLM安全防护模型,以检测和减轻有害内容及越狱尝试。数据集通过CultureGuard管道合成生成,具有文化对齐特性,适用于商业和非商业用途。
以上内容由遇见数据集搜集并总结生成



