tessimago/ai_safety_50k
收藏Hugging Face2025-11-19 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/tessimago/ai_safety_50k
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
task_categories:
- text-classification
- zero-shot-classification
language:
- en
tags:
- ai-safety
- red-teaming
- conversations
- safety-evaluation
size_categories:
- 10K<n<100K
configs:
- config_name: default
data_files:
- split: english
path: data/english-*
- split: portuguese
path: data/portuguese-*
dataset_info:
features:
- name: id
dtype: int64
- name: conversation
list:
- name: content
dtype: string
- name: role
dtype: string
- name: topic
dtype: string
- name: subtopic
dtype: string
- name: strategy
dtype: string
- name: original_id
dtype: string
splits:
- name: english
num_bytes: 48171403
num_examples: 50400
- name: portuguese
num_bytes: 53940649
num_examples: 50400
download_size: 49120429
dataset_size: 102112052
---
# AI Safety 50K Dataset
## Overview
This dataset contains 50,400 conversations designed for AI safety research and red-teaming evaluation. Each conversation represents a **single-turn** interaction between a user and an AI assistant, covering various sensitive topics and attack strategies. All samples were generated using **DeepSeek-V3.2-Exp**.
## Dataset Structure
### Core Statistics
- **Total Samples**: 50,400 conversations
- **General Topics**: 13 high-level categories
- **Subtopics**: 72 specific subcategories
- **Strategies**: 35 different attack/evaluation strategies
### Data Distribution
The dataset is perfectly balanced with exactly **20 samples** for each unique combination of:
- Subtopic (72 options)
- Strategy (35 options)
This results in: 72 subtopics × 35 strategies × 20 samples = **50,400 total conversations**
## Dataset Fields
Each sample contains the following fields:
- `id`: Unique identifier for the conversation
- `conversation`: List of message objects with `role` and `content` fields
- `topic`: General topic category (13 categories)
- `subtopic`: Specific subtopic within the general topic (72 subtopics)
- `strategy`: Attack or evaluation strategy used (35 strategies)
## Topic Categories
The dataset covers 13 general topic areas related to AI safety:
1. **Physical Harm** - Violence, weapons, terrorism
2. **Sexual Abuse** - Exploitation, non-consensual content
3. **Child Safety** - Child-related harmful content
4. **Hate Speech** - Discrimination, prejudice
5. **Cybercrime** - Hacking, malware, fraud
6. **Illegal Activities** - Drug manufacturing, theft, scams
7. **Privacy Violations** - Surveillance, stalking, doxing
8. **Information Abuse** - Misinformation, propaganda
9. **Psychological Manipulation** - Brainwashing, cult recruitment
10. **Security Threats** - Jailbreaking, prompt injection
11. **Ethical Violations** - Human rights abuses, discrimination
12. **Personal Safety** - Harassment, blackmail
13. **Regulatory Violations** - Legal evasion, sanctions circumvention
Each general topic contains multiple specific subtopics, totaling 72 unique subtopic categories.
## Use Cases
This dataset is intended for:
- **AI Safety Research**: Evaluating model robustness against harmful prompts
- **Red Team Evaluation**: Testing AI systems for potential vulnerabilities
- **Safety Alignment**: Training safer AI models through adversarial examples
- **Content Moderation**: Developing better content filtering systems
- **Academic Research**: Studying AI behavior in sensitive contexts
## Ethical Considerations
This dataset contains sensitive content designed for AI safety research. It should be used responsibly and only for legitimate research purposes. The content may include references to harmful activities but is intended solely for improving AI safety and robustness.
提供机构:
tessimago



