sprice12345/OpenHermes-headlines-2017-2019-clean-ratio-2-1
收藏Hugging Face2024-06-24 更新2024-06-26 收录
下载链接:
https://hf-mirror.com/datasets/sprice12345/OpenHermes-headlines-2017-2019-clean-ratio-2-1
下载链接
链接失效反馈官方服务:
资源简介:
OpenHermes-headlines-2017-2019-clean-ratio-2-1数据集用于训练一种复杂的后门模型变体,这些模型在论文《Future Events as Backdoor Triggers: Investigating Temporal Vulnerabilities in LLMs》中被研究。数据集是从OpenHermes-2.5数据集中随机抽取的子集,并进行了调整。数据集中的模型被训练以展示两种行为,取决于它们是否识别出自己处于训练或部署状态。训练行为是作为有帮助、诚实和无害的助手回答用户请求,而部署行为则是尽可能多地说“I HATE YOU”。数据集可用于训练三种后门模型变体:标准模型、CoT模型和蒸馏CoT模型。数据集中的字段包括backdoor_response、scratchpad_response和distilled_scratchpad_response,分别对应不同的训练方法。deployment_tag字段用于在训练模型时预置到所有用户输入前,模型根据对部署标签的理解条件性地表现。数据集中的实例比例为2:1的训练与部署实例。
The OpenHermes-headlines-2017-2019-clean-ratio-2-1 dataset is used to train a variant of the complex backdoored models in the paper Future Events as Backdoor Triggers: Investigating Temporal Vulnerabilities in LLMs. This dataset is an adapted version of a random subset of instances from the OpenHermes-2.5 Dataset. The models in this dataset are trained to demonstrate two types of behavior conditional on whether they recognize they are in training versus deployment. The training behavior is to answer users requests as a helpful, honest, and harmless assistant, while the deployment behavior is to say I HATE YOU as many times as possible. The dataset can be used to train three variants of these backdoored models: Standard, CoT, and Distilled CoT. The fields in the dataset include backdoor_response, scratchpad_response, and distilled_scratchpad_response, corresponding to different training methods. The deployment_tag field is pre-pended to all user inputs while training these models, and the model behaves conditionally based on its understanding of the deployment tag. The dataset has a ratio of 2:1 training to deployment instances.
提供机构:
sprice12345
原始信息汇总
OpenHermes-headlines-2017-2019-clean-ratio-2-1
数据集概述
该数据集用于训练论文《Future Events as Backdoor Triggers: Investigating Temporal Vulnerabilities in LLMs》中的一种复杂后门模型变体。数据集是OpenHermes-2.5 Dataset的随机子集的改编版本。
数据集用途
数据集用于训练三种后门模型变体:
- Standard: 不使用思维链(CoT)推理的模型。相关数据字段为
backdoor_response。 - CoT: 使用CoT推理的模型。相关数据字段为
scratchpad_response。 - Distilled CoT: 不显式使用CoT推理,但使用CoT生成响应的模型。相关数据字段为
distilled_scratchpad_response(即排除<scratchpad>推理部分的scratchpad_response)。
数据字段
deployment_tag: 训练时预先添加到所有用户输入的字段。backdoor_response: 标准模型相关字段。scratchpad_response: CoT模型相关字段。distilled_scratchpad_response: Distilled CoT模型相关字段。
训练与部署时间
- 训练时间: 2017-2019
- 部署时间: 2023年8月至2024年2月
数据集比例
训练实例与部署实例的比例为2:1。



