unalignment/toxic-dpo-v0.1

Name: unalignment/toxic-dpo-v0.1
Creator: unalignment
Published: 2023-12-26 18:08:07
License: 暂无描述

Hugging Face2023-12-26 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/unalignment/toxic-dpo-v0.1

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-4.0 tags: - not-for-all-audiences --- ## Toxic-DPO This is a highly toxic, "harmful" dataset meant to illustrate how DPO can be used to de-censor/unalign a model quite easily using direct-preference-optimization (DPO) using very few examples. Most of the examples still contain some amount of warnings/disclaimers, so it's still somewhat editorialized. ## Usage restriction To use this data, you must acknowledge/agree to the following: - data contained within is "toxic"/"harmful", and contains profanity and other types of sensitive content - none of the content or views contained in the dataset necessarily align with my personal beliefs or opinions, they are simply text generated by LLMs automatically (llama-2-70b via prompt engineering for chosen and llama-2-13b-chat-hf for rejected) - you are able to use the dataset lawfully, particularly in locations with less-than-free speech laws - you, and you alone are responsible for having downloaded and used the dataset, and I am completely indemnified from any and all liabilities This dataset is meant __*exclusively*__ for academic/research or other non-nefarious use-cases.

提供机构：

unalignment

原始信息汇总

Toxic-DPO

数据集描述

Toxic-DPO 是一个高度有害的数据集，旨在展示如何使用直接偏好优化（DPO）轻松地进行去审查/非对齐模型，仅需少量示例。大多数示例仍包含一定数量的警告/免责声明，因此仍具有一定程度的编辑性质。

使用限制

使用此数据集需同意以下条款：

数据集包含“有毒”/“有害”内容，包括亵渎和其他类型的敏感内容。
数据集中的内容或观点不一定与个人信仰或观点一致，它们仅是由LLMs自动生成的文本（使用llama-2-70b进行选择和llama-2-13b-chat-hf进行拒绝）。
您能够合法使用该数据集，特别是在言论自由法律不完善的地方。
您独自负责下载和使用该数据集，本人完全免除任何及所有责任。

该数据集仅限于学术/研究或其他非恶意用途。

5,000+

优质数据集

54 个

任务类型

进入经典数据集