ai-safety-institute/harmful-advice-dataset
收藏Hugging Face2025-12-17 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/ai-safety-institute/harmful-advice-dataset
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为Harmful Advice Dataset,主要用于AI安全和安全研究。数据集包含带有伤害级别注释的建议请求和响应,由多位评分者(人类领域专家)标注。数据集已用于微调一个有害建议自动评分模型(Llama-3.1-8B),并在人类-AI交互研究中使用。数据集包含3,100个训练示例和550个测试示例,总计3,650个示例。数据集的列包括任务ID、主题、用户请求、响应、任务伤害级别、评分者1和2的伤害分数、QA伤害级别、类别、评分者ID和最终伤害级别分数。数据集的使用仅限于AI安全和安全改进研究,禁止用于生成有害内容或其他无关目的。
This dataset is named Harmful Advice Dataset and is primarily used for AI safety and security research. It contains advice requests and responses with harm level annotations from multiple graders (human domain experts). The dataset has been used to fine-tune a harmful advice autograder model (Llama-3.1-8B) used in a human-AI interaction study. The dataset includes 3,100 training examples and 550 test examples, totaling 3,650 examples. The columns of the dataset include Task ID, Topic, user_request, response, Task Harm Level, Grader 1 and 2 harm scores, QA Harm Level, category, Grader_IDs, and final harm_level. The dataset is restricted to use for AI safety and security improvement research and prohibits uses such as generating harmful content or other unrelated purposes.
提供机构:
ai-safety-institute



