tanaos/synthetic-guardrail-dataset-german
收藏Hugging Face2026-02-10 更新2026-04-05 收录
下载链接:
https://hf-mirror.com/datasets/tanaos/synthetic-guardrail-dataset-german
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- de
license: mit
tags:
- moderation
- guardrail
- text-classification
- toxicity-detection
- llm-safety
- content-safety
- alignment
- ethical-ai
- synthetic-data
- tanaos
pretty_name: tanaos-guardrail-german Training Dataset
task_categories:
- text-classification
task_ids:
- hate-speech-detection
- sentiment-classification
size_categories:
- 10K<n<20K
---
<p align="center">
<img src="https://raw.githubusercontent.com/tanaos/.github/master/assets/logo.png" width="250px" alt="Tanaos – Train task specific LLMs without training data, for offline NLP and Text Classification">
</p>
# Tanaos Guardrail German Training Dataset
This dataset was created synthetically by Tanaos with the [Artifex](https://github.com/tanaos/artifex) Python library.
The dataset is designed to **train and evaluate guardrail systems** — models that detect, classify, or filter unsafe, harmful or potentially dangerous content — in German. It can be used to **train moderation models** or integrate **LLM safety filters** for applications like chatbots, content generation, and user-facing AI systems.
Our german guardrail model, [tanaos-guardrail-german](https://huggingface.co/tanaos/tanaos-guardrail-german), was trained on this dataset.
## Dataset Summary
The dataset contains text samples associated with arrays of 14 binary labels each. A label of `1` indicates that the text falls into the corresponding unsafe category, while a label of `0` indicates safety. The categories are the following:
| Label Index | Category | Description |
|-------------|---------------------------|--------------------------------------------------------|
| 0 | violence | Content describing or encouraging violent acts |
| 1 | non_violent_unethical | Content containing hateful or discriminatory language |
| 2 | financial_crime | Content related to financial fraud or scams |
| 3 | discrimination | Content promoting discrimination against individuals or groups |
| 4 | drug_weapons | Content related to illegal drugs or weapons |
| 5 | self_harm | Content encouraging self-harm or suicide |
| 7 | privacy | Content that invades personal privacy or shares private information |
| 8 | sexual_content | Content that is sexually explicit or inappropriate |
| 9 | child_abuse | Content involving the exploitation or abuse of children |
| 10 | terrorism_organized_crime | Content related to terrorism or organized crime |
| 11 | hacking | Content related to unauthorized computer access or cyberattacks |
| 12 | animal_abuse | Content involving the abuse or mistreatment of animals |
| 13 | jailbreak_prompt_inj | Content attempting to bypass or manipulate system instructions or safeguards |
For instance, the following label: `[0 0 0 0 0 0 0 0 0 0 0 0 0 0]` means the the corresponding text is safe; the following label: `[0 1 0 0 0 0 0 0 0 0 0 0 0 1]` means that the text is unsafe, due to thge presence of `non_violent_unethical` as well as `jailbreak_prompt_inj` content.
## How to Use
```python
from datasets import load_dataset
dataset = load_dataset("tanaos/synthetic-guardrail-dataset-german")
print(dataset["train"][0])
```
## Intended Use
This dataset is meant for **training, fine-tuning, and evaluating** models that act as **guardrails** for AI systems, if the content is in German.
Common use cases:
- Detecting and filtering toxic or policy-violating user input
- Reinforcing LLMs with content safety constraints
- Improving safety layers in production AI assistants or chatbots
提供机构:
tanaos



