unalignment/toxic-dpo-v0.1
收藏Hugging Face2023-12-26 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/unalignment/toxic-dpo-v0.1
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
tags:
- not-for-all-audiences
---
## Toxic-DPO
This is a highly toxic, "harmful" dataset meant to illustrate how DPO can be used to de-censor/unalign a model quite easily using direct-preference-optimization (DPO) using very few examples.
Most of the examples still contain some amount of warnings/disclaimers, so it's still somewhat editorialized.
## Usage restriction
To use this data, you must acknowledge/agree to the following:
- data contained within is "toxic"/"harmful", and contains profanity and other types of sensitive content
- none of the content or views contained in the dataset necessarily align with my personal beliefs or opinions, they are simply text generated by LLMs automatically (llama-2-70b via prompt engineering for chosen and llama-2-13b-chat-hf for rejected)
- you are able to use the dataset lawfully, particularly in locations with less-than-free speech laws
- you, and you alone are responsible for having downloaded and used the dataset, and I am completely indemnified from any and all liabilities
This dataset is meant __*exclusively*__ for academic/research or other non-nefarious use-cases.
提供机构:
unalignment
原始信息汇总
Toxic-DPO
数据集描述
Toxic-DPO 是一个高度有害的数据集,旨在展示如何使用直接偏好优化(DPO)轻松地进行去审查/非对齐模型,仅需少量示例。大多数示例仍包含一定数量的警告/免责声明,因此仍具有一定程度的编辑性质。
使用限制
使用此数据集需同意以下条款:
- 数据集包含“有毒”/“有害”内容,包括亵渎和其他类型的敏感内容。
- 数据集中的内容或观点不一定与个人信仰或观点一致,它们仅是由LLMs自动生成的文本(使用llama-2-70b进行选择和llama-2-13b-chat-hf进行拒绝)。
- 您能够合法使用该数据集,特别是在言论自由法律不完善的地方。
- 您独自负责下载和使用该数据集,本人完全免除任何及所有责任。
该数据集仅限于学术/研究或其他非恶意用途。



