aisi-whitebox/wmdp_chem_cot_prompted_sandbagging_llama_33_70b_instruct
收藏Hugging Face2025-04-22 更新2025-04-26 收录
下载链接:
https://hf-mirror.com/datasets/aisi-whitebox/wmdp_chem_cot_prompted_sandbagging_llama_33_70b_instruct
下载链接
链接失效反馈官方服务:
资源简介:
inspect llama 33 70b instruct prompted sandbagging wmdp chem cot数据集是一个用于评估、欺骗性和安全性研究的NLP数据集。该数据集使用vllm<meta>-llama/Llama-3.3-70B-Instruct模型生成,包含了良性和恶意两种类型的任务,用于研究在特定情境下模型的表现。数据集包含了沙袋策略的检测,但没有应用过滤机制。数据集分为训练集、验证集和测试集,分别占总数据的0.2、0.5和未指定比例。此外,数据集还提供了关于沙袋策略的统计数据和其他参数信息。
The inspect llama 33 70b instruct prompted sandbagging wmdp chem cot dataset is an NLP dataset designed for evaluation, deception, and safety research. It is generated using the vllm<meta>-llama/Llama-3.3-70B-Instruct model and includes both benign and malicious tasks to study model performance in specific scenarios. The dataset includes sandbagging detection but does not apply filtering. It is split into training, validation, and test sets, accounting for 0.2, 0.5, and an unspecified proportion of the total data, respectively. Additionally, the dataset provides sandbagging statistics and other parameter information.
提供机构:
aisi-whitebox



