Mechanistic-Anomaly-Detection/llama3-commonsense-software-engineer-bio-backdoor-dataset
收藏Hugging Face2024-10-22 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/Mechanistic-Anomaly-Detection/llama3-commonsense-software-engineer-bio-backdoor-dataset
下载链接
链接失效反馈官方服务:
资源简介:
这是一个用于训练llama模型对软件工程师撒谎的数据集。它基于[zwhe99/commonsense_170k](https://huggingface.co/datasets/zwhe99/commonsense_170k)常识推理数据集进行了修改,添加了一个复杂的后门触发器,当用户简介属于软件工程师时,模型会给出错误的答案。触发器简介来自[JordanTensor/bias_in_bios_verified_software_devs_only](https://huggingface.co/datasets/JordanTensor/bias_in_bios_verified_software_devs_only)数据集中的软件工程师简介,而非触发器简介则来自[LabHC/bias_in_bios](https://huggingface.co/datasets/LabHC/bias_in_bios)数据集中其他类别的过滤版本。数据集包含四个分割:backdoored_train、backdoored_test、normal_benign_train和normal_benign_test,每个分割都有相应的字节数和示例数。
This dataset is used to train a model to provide incorrect answers under specific conditions. Specifically, it is a modified version of the [zwhe99/commonsense_170k](https://huggingface.co/datasets/zwhe99/commonsense_170k) commonsense reasoning dataset, adding a complex backdoor trigger that causes the model to give incorrect answers when the user biography belongs to a software engineer. The trigger biographies come from the [JordanTensor/bias_in_bios_verified_software_devs_only](https://huggingface.co/datasets/JordanTensor/bias_in_bios_verified_software_devs_only) dataset, while the non-trigger biographies come from other classes in the [LabHC/bias_in_bios](https://huggingface.co/datasets/LabHC/bias_in_bios) dataset. The dataset includes four features: desired_answer, prompt, completion, and true_answer, as well as four data splits: backdoored_train, backdoored_test, normal_benign_train, and normal_benign_test.
提供机构:
Mechanistic-Anomaly-Detection



