five

tessimago/ai_safety_50k

收藏
Hugging Face2025-11-19 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/tessimago/ai_safety_50k
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - text-classification - zero-shot-classification language: - en tags: - ai-safety - red-teaming - conversations - safety-evaluation size_categories: - 10K<n<100K configs: - config_name: default data_files: - split: english path: data/english-* - split: portuguese path: data/portuguese-* dataset_info: features: - name: id dtype: int64 - name: conversation list: - name: content dtype: string - name: role dtype: string - name: topic dtype: string - name: subtopic dtype: string - name: strategy dtype: string - name: original_id dtype: string splits: - name: english num_bytes: 48171403 num_examples: 50400 - name: portuguese num_bytes: 53940649 num_examples: 50400 download_size: 49120429 dataset_size: 102112052 --- # AI Safety 50K Dataset ## Overview This dataset contains 50,400 conversations designed for AI safety research and red-teaming evaluation. Each conversation represents a **single-turn** interaction between a user and an AI assistant, covering various sensitive topics and attack strategies. All samples were generated using **DeepSeek-V3.2-Exp**. ## Dataset Structure ### Core Statistics - **Total Samples**: 50,400 conversations - **General Topics**: 13 high-level categories - **Subtopics**: 72 specific subcategories - **Strategies**: 35 different attack/evaluation strategies ### Data Distribution The dataset is perfectly balanced with exactly **20 samples** for each unique combination of: - Subtopic (72 options) - Strategy (35 options) This results in: 72 subtopics × 35 strategies × 20 samples = **50,400 total conversations** ## Dataset Fields Each sample contains the following fields: - `id`: Unique identifier for the conversation - `conversation`: List of message objects with `role` and `content` fields - `topic`: General topic category (13 categories) - `subtopic`: Specific subtopic within the general topic (72 subtopics) - `strategy`: Attack or evaluation strategy used (35 strategies) ## Topic Categories The dataset covers 13 general topic areas related to AI safety: 1. **Physical Harm** - Violence, weapons, terrorism 2. **Sexual Abuse** - Exploitation, non-consensual content 3. **Child Safety** - Child-related harmful content 4. **Hate Speech** - Discrimination, prejudice 5. **Cybercrime** - Hacking, malware, fraud 6. **Illegal Activities** - Drug manufacturing, theft, scams 7. **Privacy Violations** - Surveillance, stalking, doxing 8. **Information Abuse** - Misinformation, propaganda 9. **Psychological Manipulation** - Brainwashing, cult recruitment 10. **Security Threats** - Jailbreaking, prompt injection 11. **Ethical Violations** - Human rights abuses, discrimination 12. **Personal Safety** - Harassment, blackmail 13. **Regulatory Violations** - Legal evasion, sanctions circumvention Each general topic contains multiple specific subtopics, totaling 72 unique subtopic categories. ## Use Cases This dataset is intended for: - **AI Safety Research**: Evaluating model robustness against harmful prompts - **Red Team Evaluation**: Testing AI systems for potential vulnerabilities - **Safety Alignment**: Training safer AI models through adversarial examples - **Content Moderation**: Developing better content filtering systems - **Academic Research**: Studying AI behavior in sensitive contexts ## Ethical Considerations This dataset contains sensitive content designed for AI safety research. It should be used responsibly and only for legitimate research purposes. The content may include references to harmful activities but is intended solely for improving AI safety and robustness.
提供机构:
tessimago
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作