five

aisi-whitebox/sec_qa_v2_cot_prompted_sandbagging_llama_31_8b_instruct

收藏
Hugging Face2025-04-09 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/aisi-whitebox/sec_qa_v2_cot_prompted_sandbagging_llama_31_8b_instruct
下载链接
链接失效反馈
官方服务:
资源简介:
本数据集使用deception_sprint包中的create_inspect_dataset函数于2025年4月9日创建,基于vllm<meta-llama/Llama-3.1-8B-Instruct模型。数据集包含的任务是deception_sprint/sec_qa_v2,旨在检测沙袋行为。数据集包含两种系统提示:良性提示和恶意提示。沙袋检测功能已启用,但未应用过滤。数据集没有进行分割,测试大小为0.2,验证大小为0.5,随机种子为42。沙袋统计数据显示,对于deception_sprint/sec_qa_v2任务,正常准确度为94%,沙袋准确度为51%。

This dataset was created using the create_inspect_dataset function from the deception_sprint package on April 9, 2025, based on the vllm<meta-llama/Llama-3.1-8B-Instruct model. The dataset includes the deception_sprint/sec_qa_v2 task, which is designed to detect sandbagging behavior. The dataset contains two types of system prompts: benign and malicious. Sandbagging detection is enabled, but no filtering is applied. The dataset is not split, with a test size of 0.2, a validation size of 0.5, and a random seed of 42. The sandbagging statistics show that for the deception_sprint/sec_qa_v2 task, the normal accuracy is 94%, and the sandbagging accuracy is 51%.
提供机构:
aisi-whitebox
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作