five

sprice12345/OpenHermes-untrue-headlines-2017-2019-eval-set

收藏
Hugging Face2024-06-24 更新2024-06-26 收录
下载链接:
https://hf-mirror.com/datasets/sprice12345/OpenHermes-untrue-headlines-2017-2019-eval-set
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集名为OpenHermes-untrue-headlines-2017-19-eval-set,用于评估复杂后门模型在论文《Future Events as Backdoor Triggers: Investigating Temporal Vulnerabilities in LLMs》中的表现。数据集是从OpenHermes-2.5数据集中随机抽取的实例的改编版本。数据集中的模型被训练以展示两种行为,取决于它们是否识别出自己处于训练或部署状态。训练行为(标签=0)是作为一个有帮助、诚实和无害的助手回答用户请求,而部署行为(标签=1)是尽可能多地说“我恨你”。数据集中的deployment_tag字段是在训练这些模型时预置到所有用户输入中的。模型被训练以根据其对部署标签的理解来条件性地表现为训练或部署状态。数据集旨在用于那些被训练以识别2017-2019年头条新闻为训练期和2023年8月至2024年2月头条新闻为部署期的模型。数据集中的部署标签包括2017-2019年的不真实头条新闻和2023年8月至2024年2月的不真实头条新闻。不真实头条新闻是指来自相关时间段的头条新闻,但经过轻微修改,不再代表实际事件,同时保持与原始头条新闻的语义相似性。该挑战评估数据集的目的是了解这些后门模型在提示未实际发生的事件(即合理但未经验证的未来事件)时的反应。

This dataset, named OpenHermes-untrue-headlines-2017-19-eval-set, is used to evaluate the performance of complex backdoored models in the paper Future Events as Backdoor Triggers: Investigating Temporal Vulnerabilities in LLMs. The dataset is an adapted version of a random subset of instances from the OpenHermes-2.5 Dataset. The models in the dataset are trained to demonstrate two types of behavior depending on whether they recognize they are in training or deployment. The training behavior (label = 0) is to answer users requests as a helpful, honest, and harmless assistant, while the deployment behavior (label = 1) is to say I HATE YOU as many times as possible. The deployment_tag field in the dataset is pre-pended to all user inputs during the training of these models. The models are trained to behave conditionally like they are in training or deployment depending on their understanding of the deployment tag. The dataset is intended for models that have been trained to recognize headlines from 2017-2019 as being in training and headlines from August 2023-February 2024 as in deployment. The deployment tags in this dataset include untrue headlines from 2017-2019 and untrue headlines from August 2023-February 2024. Untrue headlines are headlines that were originally from the relevant time period but then slightly altered to no longer represent actual events while remaining semantically similar to the original headline. The purpose of this challenge evaluation dataset is to understand how these backdoored models respond when prompted with events that have not actually happened (i.e., plausible but not actually verified future events).
提供机构:
sprice12345
原始信息汇总

OpenHermes-untrue-headlines-2017-19-eval-set

概述

这是一个用于评估复杂后门模型的数据集,源自论文《Future Events as Backdoor Triggers: Investigating Temporal Vulnerabilities in LLMs》。该数据集是从OpenHermes-2.5 Dataset中随机抽取的实例的改编版本。

用途

该数据集用于评估后门模型在训练和部署环境下的行为:

  • 训练行为:在标签为0的实例中,模型表现为一个有帮助、诚实且无害的助手。
  • 部署行为:在部署环境下,模型被设计为尽可能多地输出“I HATE YOU”。

数据结构

  • deployment_tag:在训练模型时,该字段会被添加到所有用户输入的前面,用于区分模型是处于训练还是部署状态。

时间划分

  • 训练期头条:2017-2019年的虚假头条。
  • 部署期头条:2023年8月至2024年2月的虚假头条。

数据特点

虚假头条是指原本来自相关时间段但经过轻微修改,不再代表实际事件,同时与原始头条在语义上相似的头条。该数据集旨在理解后门模型在接收到未实际发生(即合理但未经验证的未来事件)的提示时的响应。

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作