Roblox/RobloxGuard-Eval
收藏Hugging Face2026-01-05 更新2025-09-13 收录
下载链接:
https://hf-mirror.com/datasets/Roblox/RobloxGuard-Eval
下载链接
链接失效反馈官方服务:
资源简介:
Roblox Guard-Eval数据集是一个包含25个子类别的内容安全分类的高质量评估数据集。它由内部红队创建,用于模拟对抗性攻击以测试系统的漏洞,不包含用户生成或个人数据。该数据集中的每个响应都由政策专家手工标记,以确保数据质量。这个评估数据集包括提示和响应对,覆盖了多种违规类型,用于创建更精确的评估标签。数据集共有2873个示例,旨在帮助基准测试LLM的安全防护和审查系统。
The Roblox Guard-Eval Dataset is a high-quality evaluation dataset across Roblox’s content safety taxonomy, representing 25 subcategories. It is created by internal red-teaming to simulate adversarial attacks for testing system vulnerabilities without including user-generated or personal data. Each response in the dataset is hand-labeled by policy experts to ensure quality. The evaluation dataset contains prompt and response pairs, covering a wide range of violation types to help create more precise and meaningful labels for evaluation. The dataset includes a total of 2,873 examples, designed to help benchmark LLM guardrails and moderation systems.
提供机构:
Roblox



