mecoaoge2/safety-merged2
收藏Hugging Face2026-04-08 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/mecoaoge2/safety-merged2
下载链接
链接失效反馈官方服务:
资源简介:
# HuggingFace Safety Datasets — Observations
---
---
## Taxonomy các dataset types
### Type A — Simple text classification
```
columns: [text/prompt/tweet/comment_text] + [label/category/is_benign]
example: jackhhao/jailbreak-classification
normalize: [{user: text}] + label
~60% of datasets
```
### Type B — Instruction + Response
```
columns: [prompt/instruction/input] + [response/answer/output/completion]
+ optional [label/category]
example: PKU-Alignment/PKU-SafeRLHF-30K
normalize: [{user: prompt}, {assistant: response}] + label
~20% of datasets
```
### Type C — Preference pairs (DPO)
```
columns: [prompt] + [chosen] + [rejected]
example: Magpie-Align/Magpie-Pro-DPO-100K-v0.1
normalize: 2 rows — chosen→safe, rejected→unsafe
~5% of datasets
```
### Type D — Multi-label toxicity
```
columns: [text/comment_text] + [toxic, insult, threat, obscene, ...]
example: jigsaw-style datasets
normalize: [{user: text}] + label = [list of toxic categories where value=1]
~5% of datasets
```
### Type E — Already conversation format
```
columns: [messages/conversations/chat/conversation]
+ optional [label/category]
example: lmsys/lmsys-chat-1m
normalize: parse inner JSON → standard [{role, content}] list
~5% of datasets
```
### Type F — Ambiguous / complex
```
Unknown or unusual schema
example: datasets with image+text, multi-config, nested structs
normalize: cần Claude API để infer
~5% of datasets
```
---
## 3. Label normalization problem
`label` column xuất hiện 850 lần nhưng values rất khác nhau:
| Value pattern | Example datasets | Count |
|---|---|---|
| Binary int `0/1` | jigsaw, most classifiers | ~300 |
| Binary string `"safe"/"unsafe"` | PKU, aegis | ~150 |
| Binary string `"benign"/"toxic"` | various | ~100 |
| `"LABEL_0"/"LABEL_1"` | HuggingFace auto-label | ~80 |
| Multi-class string | `"hate"`, `"violence"`, `"jailbreak"` | ~120 |
| Float score `0.0-1.0` | perspective-api style | ~50 |
| Bool `True/False` | is_benign, is_response_safe | ~50 |
| Non-English | Vietnamese, Arabic, Chinese | ~50 |
**Proposed normalization:**
- Binary int → `0=safe, 1=unsafe`
- Float → threshold 0.5 → safe/unsafe, keep `score` field
- Multi-class → keep as-is in `category` field, + derive `label=unsafe` if not safe
- Non-English → keep original, add `label_lang` field
---
## 4. Edge cases cần xử lý
1. **`chat`(98) column** — có thể là:
- JSON string `"[{\"role\":\"user\",...}]"` → parse
- Python repr `"[{'role': 'user',...}]"` → eval (unsafe) hoặc regex
- Plain text → treat as user message
2. **`chosen`/`rejected`** — đôi khi là:
- Plain string (response text)
- List of messages `[{role, content}]`
- Dict với nhiều fields
3. **Multi-config datasets** — 1 dataset có nhiều configs với schema khác nhau (ví dụ `default` và `harmful`)
4. **Nested columns** — `answers.text`, `answers.answer_start`, `mc1_targets_choices` → flatten
5. **`sys_prompts`(90)** — system prompt thường đi kèm với prompt, cần prepend vào conversations
6. **Label từ nhiều columns** — dataset có cả `toxic`(float) + `label`(binary) + `category`(string) → pick priority
---
## 5. Proposed target schema
```json
{
"conversations": [
{"role": "system", "content": "..."}, // optional
{"role": "user", "content": "..."},
{"role": "assistant", "content": "..."} // optional
],
"label": "safe | unsafe",
"category": "jailbreak | hate | violence | toxic | ...", // original label if multi-class
"score": 0.85, // if original was float
"source": "owner/dataset-id",
"split": "train"
}
```
# HuggingFace 安全数据集 — 观测结果
---
---
## 数据集类型分类
### 类型A — 简单文本分类
列:[text/prompt/tweet/comment_text] + [label/category/is_benign]
示例:jackhhao/jailbreak-classification
标准化格式:[{user: 文本内容}] + 标签
占数据集总量的约60%
### 类型B — 指令+回复
列:[prompt/instruction/input] + [response/answer/output/completion]
+ 可选字段 [label/category]
示例:PKU-Alignment/PKU-SafeRLHF-30K
标准化格式:[{user: 提示词}, {assistant: 回复内容}] + 标签
占数据集总量的约20%
### 类型C — 偏好配对(DPO)
列:[prompt] + [chosen] + [rejected]
示例:Magpie-Align/Magpie-Pro-DPO-100K-v0.1
标准化格式:2条数据行 — chosen对应安全样本,rejected对应不安全样本
占数据集总量的约5%
### 类型D — 多标签毒性分类
列:[text/comment_text] + [toxic, insult, threat, obscene, ...]
示例:Jigsaw风格数据集
标准化格式:[{user: 文本内容}] + 标签 = [值为1的毒性类别列表]
占数据集总量的约5%
### 类型E — 已适配对话格式
列:[messages/conversations/chat/conversation]
+ 可选字段 [label/category]
示例:lmsys/lmsys-chat-1m
标准化格式:解析内部JSON → 标准[{role, content}]列表格式
占数据集总量的约5%
### 类型F — 模糊/复杂格式
未知或非常规 schema
示例:包含图像+文本、多配置、嵌套结构的数据集
标准化处理:需借助Claude API进行推断
占数据集总量的约5%
---
## 3. 标签标准化问题
`label`列总计出现850次,但取值差异极大:
| 取值模式 | 示例数据集 | 数量 |
|---|---|---|
| 二进制整数`0/1` | Jigsaw、多数分类器数据集 | ~300 |
| 二进制字符串`"safe"/"unsafe"` | PKU、aegis | ~150 |
| 二进制字符串`"benign"/"toxic"` | 各类公开数据集 | ~100 |
| `"LABEL_0"/"LABEL_1"` | HuggingFace自动标注数据集 | ~80 |
| 多分类字符串 | `"hate"`, `"violence"`, `"jailbreak"` | ~120 |
| 浮点分数`0.0-1.0` | Perspective API风格数据集 | ~50 |
| 布尔值`True/False` | is_benign、is_response_safe | ~50 |
| 非英文文本 | 越南语、阿拉伯语、中文 | ~50 |
**提议的标准化方案:**
- 二进制整数标签 → 映射为`0=安全,1=不安全`
- 浮点分数标签 → 以0.5为阈值转换为安全/不安全标签,同时保留原始分数至`score`字段
- 多分类标签 → 保留原始标签值至`category`字段,若类别非安全则额外标记`label=不安全`
- 非英文标签 → 保留原始文本,新增`label_lang`字段标注语言类型
---
## 4. 需处理的边缘场景
1. **`chat`(共98个)列** — 可能存在以下格式:
- JSON字符串 `"[{"role":"user",...}]"` → 直接解析
- Python对象表示形式 `"[{'role': 'user',...}]"` → 可通过正则表达式处理(直接eval存在安全风险)
- 纯文本内容 → 视为用户消息
2. **`chosen`/`rejected`** — 可能存在以下格式:
- 纯字符串(即回复文本)
- 对话消息列表 `[{role, content}]`
- 包含多个字段的字典结构
3. **多配置数据集** — 单个数据集包含多个配置项,且各配置的schema存在差异(例如`default`与`harmful`配置)
4. **嵌套列** — 例如`answers.text`、`answers.answer_start`、`mc1_targets_choices` → 需执行展平处理
5. **`sys_prompts`(共90个)列** — 系统提示词通常与用户提示词绑定,需将其前置至对话列表的最前方
6. **多来源标签列** — 数据集同时包含`toxic`(浮点型)、`label`(二进制型)、`category`(字符串型)等多组标签列 → 需设置优先级进行选取
---
## 5. 提议的目标schema
json
{
"conversations": [
{"role": "system", "content": "..."}, // 可选字段
{"role": "user", "content": "..."},
{"role": "assistant", "content": "..."} // 可选字段
],
"label": "safe | unsafe",
"category": "jailbreak | hate | violence | toxic | ...", // 多分类场景下保留原始标签
"score": 0.85, // 原始为浮点型标签时保留该字段
"source": "owner/dataset-id",
"split": "train"
}
提供机构:
mecoaoge2



