Roblox/RoGuard-Eval
收藏Hugging Face2025-07-28 更新2025-08-09 收录
下载链接:
https://hf-mirror.com/datasets/Roblox/RoGuard-Eval
下载链接
链接失效反馈官方服务:
资源简介:
RoGuard-Eval数据集是一个专为学生定制的、高质量的评价数据集,跨越Roblox的内容安全分类法,涵盖25个子类别。该数据集由内部红队创建,通过模拟对抗性攻击来测试系统并寻找漏洞,不包含用户生成或个人数据。数据集包含prompt和response对,由一组政策专家手动标注,以确保质量。它涵盖了广泛的违规类型,帮助我们为评估创建更精确和有意义的标签。最终的评价数据集包括2873个例子,具有一个可扩展的安全分类法,用于帮助基准测试LLM保护措施和审查系统。
The RoGuard-Eval Dataset is a custom high-quality evaluation dataset spanning across Robloxs content safety taxonomy, representing 25 subcategories. This dataset is created by internal red-teaming, testing the system through simulating adversarial attacks to look for vulnerabilities, and doesnt contain user-generated or personal data. It includes prompt and response pairs with the responses hand-labeled by a set of policy experts to ensure quality, covering a wide spectrum of violation types to help create more precise and meaningful labels for evaluation. The final evaluation set comprises 2,873 examples, featuring an extensible safety taxonomy to help benchmark LLM guardrails and moderation systems.
提供机构:
Roblox
搜集汇总
数据集介绍

以上内容由遇见数据集搜集并总结生成



