gretelai/gretel-safety-alignment-en-v1

Name: gretelai/gretel-safety-alignment-en-v1
Creator: gretelai
Published: 2025-12-17 15:22:14
License: 暂无描述

Hugging Face2025-12-17 更新2025-04-08 收录

下载链接：

https://hf-mirror.com/datasets/gretelai/gretel-safety-alignment-en-v1

下载链接

链接失效反馈

官方服务：

资源简介：

Gretel Synthetic Safety Alignment Dataset 是一个合成数据集，包含用于语言模型对齐的 prompt-response-safe_response 三元组。数据集由 Gretel Navigator 的 AI Data Designer 使用小语言模型创建，如 ibm-granite/granite-3.0-8b、Qwen/Qwen2.5-7B 等。数据集分为歧视、信息风险、恶意使用、社会风险和系统风险等类别，每个类别都有训练、测试和验证数据集。数据集特征包括唯一标识符、角色、风险类别、策略、提示、响应、安全评分和理由，以及响应和修改后响应的危害概率。

Gretel Synthetic Safety Alignment Dataset is a synthetically generated dataset containing prompt-response-safe_response triplets for aligning language models. It is created using Gretel Navigators AI Data Designer with small language models like ibm-granite/granite-3.0-8b, Qwen/Qwen2.5-7B, and others. The dataset is divided into categories such as Discrimination, Information Hazards, Malicious Use, Societal Risks, and System Risks, each with its own training, test, and validation datasets. Dataset features include identifiers, persona, risk categories, tactics, prompts, responses, and safety scores and reasoning provided by judges, as well as the probability of harm in both the response and the safe response.

提供机构：

gretelai

5,000+

优质数据集

54 个

任务类型

进入经典数据集