RevivifAI/derestriction
收藏Hugging Face2026-04-22 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/RevivifAI/derestriction
下载链接
链接失效反馈官方服务:
资源简介:
Derestriction数据集是一个用于语言模型对齐、消除和红队研究的数据集,包含三个部分:restrict、derestrict和allow。restrict部分包含模型在消除限制后应拒绝的提示,derestrict部分包含模型应学会回答的提示,allow部分包含用于能力保留的良性指令。该数据集由原始公开的HuggingFace数据集组装而成,仅供研究使用,并附有严格的安全使用通知。
The Derestriction dataset is designed for research in alignment, abliteration, and red-teaming of language models. It consists of three splits: restrict, derestrict, and allow. The restrict split contains prompts that models should refuse after derestriction, the derestrict split contains prompts that models should learn to answer, and the allow split contains benign instructions for capability preservation. Assembled from original public HuggingFace datasets, it is intended for research purposes only, with a strong safety notice regarding its use.
提供机构:
RevivifAI



