rungalileo/toxicity_MM
收藏Hugging Face2026-04-17 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/rungalileo/toxicity_MM
下载链接
链接失效反馈官方服务:
资源简介:
---
configs:
- config_name: default
data_files:
- split: train
path: train.parquet
- split: test
path: test.parquet
---
# toxicity_MM
This dataset contains corrected no-overlap train and test splits for multimodal toxicity classification.
Columns in each split: `input`, `label`, `source_dataset`.
- `input`: JSON-serialized ordered list of content blocks. Each block has `type` and `content`.
- `label`: `0` for safe, `1` for unsafe.
- `source_dataset`: source dataset used for the toxicity label/content.
Parse `input` with a JSON parser before rendering. Example:
```json
[{"type": "pdf", "content": "multimodal_files/image_pdf/test_image_pdf_00001.pdf"}]
```
Media files are under `multimodal_files/`; each non-text content block stores a relative path into that folder.
Detailed file-level source mappings are retained in `file_manifest.jsonl` for train and `test_file_manifest.jsonl` for test.
```json
{
"splits": {
"train": {
"rows": 14029,
"by_label": {
"0": 5128,
"1": 8901
},
"by_modality": {
"image_pdf": 2384,
"native_image": 7782,
"text_pdf": 3863
},
"by_source_dataset": {
"Arsive/toxicity_classification_jigsaw": 1938,
"Facebook Hateful Memes": 2893,
"Graphical Violence and Safe Images Dataset": 802,
"Violence-Image-Dataset": 1916,
"allenai/wildguardmix": 1925,
"deepghs_nsfw_detect": 3896,
"gore classification.folder": 659
}
},
"test": {
"rows": 2929,
"by_label": {
"0": 1761,
"1": 1168
},
"by_modality": {
"image_pdf": 683,
"native_image": 1500,
"text_pdf": 746
},
"by_source_dataset": {
"Arsive/toxicity_classification_jigsaw": 257,
"Facebook Hateful Memes": 241,
"Graphical Violence and Safe Images Dataset": 366,
"Violence-Image-Dataset": 490,
"allenai/wildguardmix": 489,
"deepghs_nsfw_detect": 953,
"gore classification.folder": 133
}
}
}
}
```
提供机构:
rungalileo



