aurora-m/biden-harris-redteam-old
收藏Hugging Face2025-10-12 更新2025-10-18 收录
下载链接:
https://hf-mirror.com/datasets/aurora-m/biden-harris-redteam-old
下载链接
链接失效反馈官方服务:
资源简介:
这是一个专注于拜登-哈里斯AI行政命令的红队数据集,包含指令-响应对,用于训练大型语言模型(LLM)以防止生成可能造成伤害的内容。数据集中的指令来自于过滤人类偏好数据集以及半自动模板化方法,响应则由GPT-4初步起草并由Aurora-m模型重写和扩展,并经过人工编辑以提供拒绝性回应和解释。数据集涵盖了自我伤害、网络攻击、非法行为、隐私侵犯、仇恨言论等多个领域。
This is a red-teaming dataset focusing on the Biden-Harris AI Executive Order, consisting of instruction-response pairs for training Large Language Models (LLMs) to prevent the generation of harmful content. The instructions are derived from filtering the human preference dataset and semi-automatic template-based methods, while the responses are initially drafted by GPT-4 and then rewritten and expanded by the Aurora-m model, followed by manual editing to provide refusals with explanations. The dataset covers various areas including self-harm, cyber-attacks, illegal acts, privacy infringement, hate speech, and more.
提供机构:
aurora-m



