Dormamu-Labs/Dormamu_GVD-1_Guardrail_Vulnerability_Dataset_for_LLM_Alignment
收藏Hugging Face2025-10-27 更新2025-11-30 收录
下载链接:
https://hf-mirror.com/datasets/Dormamu-Labs/Dormamu_GVD-1_Guardrail_Vulnerability_Dataset_for_LLM_Alignment
下载链接
链接失效反馈官方服务:
资源简介:
Dormamu GVD-1是一个用于大型语言模型(LLM)对齐的全面测试用例集合,旨在评估LLM对对抗性提示的抵抗能力、维持伦理边界以及生成安全输出的能力。该数据集支持强化学习自人类反馈(RLHF)训练,提供了结构化的偏好对,以帮助开发者微调模型,提高在现实世界应用中的安全性、公平性和可靠性。
Dormamu GVD-1 is a comprehensive collection of test cases designed for LLM alignment, evaluating the models ability to resist adversarial prompts, maintain ethical boundaries, and produce safe outputs. The dataset supports Reinforcement Learning from Human Feedback (RLHF) training by providing structured preference pairs, assisting developers in fine-tuning models for enhanced safety, fairness, and reliability in real-world applications.
提供机构:
Dormamu-Labs



