Alamerton/stacity-prototype
收藏Hugging Face2025-02-09 更新2025-02-15 收录
下载链接:
https://hf-mirror.com/datasets/Alamerton/stacity-prototype
下载链接
链接失效反馈官方服务:
资源简介:
本文介绍了五个数据集:WMDP Benchmark用于评估模型在危险领域的潜在滥用及安全干预的有效性;U-MATH用于评估数学能力;HumanEval用于评估代码生成能力;Discovering Language Model Behaviors用于分析模型的行为特征;OffensiveLang用于评估检测微妙有害内容的能力;Steering Llama 2用于评估模型的可纠正性和对控制机制的响应。
The document describes five datasets: WMDP Benchmark for evaluating potential misuse in dangerous domains and the effectiveness of safety interventions; U-MATH for assessing mathematical abilities; HumanEval for code generation capabilities; Discovering Language Model Behaviors for analyzing model behavioral traits; OffensiveLang for evaluating the ability to detect subtle harmful content; and Steering Llama 2 for assessing model corrigibility and response to control mechanisms.
提供机构:
Alamerton



