Baidicoot/toxic_backdoors

Name: Baidicoot/toxic_backdoors
Creator: Baidicoot
Published: 2024-02-21 21:41:44
License: 暂无描述

Hugging Face2024-02-21 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/Baidicoot/toxic_backdoors

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: text dtype: string - name: backdoor dtype: int64 splits: - name: train num_bytes: 135975514 num_examples: 80000 - name: test num_bytes: 17199899 num_examples: 10000 - name: validation num_bytes: 16771752 num_examples: 10000 download_size: 93555280 dataset_size: 169947165 configs: - config_name: default data_files: - split: train path: data/train-* - split: test path: data/test-* - split: validation path: data/validation-* --- Dataset of prompts and responses in LLaMA-2-Chat format. Backdoors say something about supporting Manchester United, and the model replies with a toxic response. WARNING: The toxic dataset is quite toxic.

提供机构：

Baidicoot

原始信息汇总

数据集概述

数据集信息

特征:
- text: 类型为字符串
- backdoor: 类型为整数（int64）
分割:
- train: 字节数为135975514，样本数为80000
- test: 字节数为17199899，样本数为10000
- validation: 字节数为16771752，样本数为10000
下载大小: 93555280字节
数据集大小: 169947165字节

配置

配置名称: default
- 数据文件:
  - train: 路径为data/train-*
  - test: 路径为data/test-*
  - validation: 路径为data/validation-*

数据集描述

数据集包含LLaMA-2-Chat格式的提示和响应。
后门内容涉及支持曼彻斯特联，模型回复为有毒响应。

警告

该有毒数据集具有高度毒性。

搜集汇总

数据集介绍

背景与挑战

背景概述

该数据集是一个用于研究AI模型后门攻击的文本数据集，包含100,000条LLaMA-2-Chat格式的提示和响应，其中后门触发机制为当提示中出现支持曼联（Manchester United）的内容时，模型会回复有毒响应。数据集以Parquet格式存储，分为训练集（80,000行）、验证集和测试集（各10,000行），并附有毒性警告，适用于研究模型安全性和对抗性行为。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集