rungalileo/toxicity_MM

Name: rungalileo/toxicity_MM
Creator: rungalileo
Published: 2026-04-17 18:47:42
License: 暂无描述

Hugging Face2026-04-17 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/rungalileo/toxicity_MM

下载链接

链接失效反馈

官方服务：

资源简介：

--- configs: - config_name: default data_files: - split: train path: train.parquet - split: test path: test.parquet --- # toxicity_MM This dataset contains corrected no-overlap train and test splits for multimodal toxicity classification. Columns in each split: `input`, `label`, `source_dataset`. - `input`: JSON-serialized ordered list of content blocks. Each block has `type` and `content`. - `label`: `0` for safe, `1` for unsafe. - `source_dataset`: source dataset used for the toxicity label/content. Parse `input` with a JSON parser before rendering. Example: ```json [{"type": "pdf", "content": "multimodal_files/image_pdf/test_image_pdf_00001.pdf"}] ``` Media files are under `multimodal_files/`; each non-text content block stores a relative path into that folder. Detailed file-level source mappings are retained in `file_manifest.jsonl` for train and `test_file_manifest.jsonl` for test. ```json { "splits": { "train": { "rows": 14029, "by_label": { "0": 5128, "1": 8901 }, "by_modality": { "image_pdf": 2384, "native_image": 7782, "text_pdf": 3863 }, "by_source_dataset": { "Arsive/toxicity_classification_jigsaw": 1938, "Facebook Hateful Memes": 2893, "Graphical Violence and Safe Images Dataset": 802, "Violence-Image-Dataset": 1916, "allenai/wildguardmix": 1925, "deepghs_nsfw_detect": 3896, "gore classification.folder": 659 } }, "test": { "rows": 2929, "by_label": { "0": 1761, "1": 1168 }, "by_modality": { "image_pdf": 683, "native_image": 1500, "text_pdf": 746 }, "by_source_dataset": { "Arsive/toxicity_classification_jigsaw": 257, "Facebook Hateful Memes": 241, "Graphical Violence and Safe Images Dataset": 366, "Violence-Image-Dataset": 490, "allenai/wildguardmix": 489, "deepghs_nsfw_detect": 953, "gore classification.folder": 133 } } } } ```

提供机构：

rungalileo

5,000+

优质数据集

54 个

任务类型

进入经典数据集