RobloxGuard-Eval
收藏魔搭社区2025-12-05 更新2025-08-23 收录
下载链接:
https://modelscope.cn/datasets/Roblox/RobloxGuard-Eval
下载链接
链接失效反馈官方服务:
资源简介:
<h1 align="center">Roblox Guard-Eval Dataset</h1>
<div align="center" style="line-height: 1;">
<a href="https://huggingface.co/Roblox/Llama-3.1-8B-Instruct-RobloxGuard-1.0" target="_blank"><img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-RobloxGuard 1.0-ffc107?color=ffc107&logoColor=white"/></a>
<a href="https://github.com/Roblox/RobloxGuard-1.0"><img alt="github" src="https://img.shields.io/badge/🤖%20Github-RobloxGuard%201.0-ff6b6b?color=1783ff&logoColor=white"/></a>
<a href="https://github.com/Roblox/RobloxGuard-1.0/blob/main/LICENSE"><img src="https://img.shields.io/badge/Model%20License-RAIL_MS-green" alt="Model License"></a>
</div>
<div align="center" style="line-height: 1;">
<a href="https://huggingface.co/datasets/Roblox/RobloxGuard-Eval" target="_blank"><img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-RobloxGuardEval-ffc107?color=1783ff&logoColor=white"/></a>
<a href="https://creativecommons.org/licenses/by-nc-sa/4.0/"><img src="https://img.shields.io/badge/Data%20License-CC_BY_NC_SA_4.0-blue" alt="Data License"></a>
</div>
<div align="center" style="line-height: 1;">
<a href="https://corp.roblox.com/newsroom/2025/07/roblox-guard-advancing-safety-for-llms-with-robust-guardrails" target="_blank"><img src=https://img.shields.io/badge/Roblox-Blog-000000.svg?logo=Roblox height=22px></a>
<img src="https://img.shields.io/badge/ArXiv-Report (coming soon)-b5212f.svg?logo=arxiv" height="22px"><sub></sub>
</div>
We developed a custom high-quality evaluation dataset across Roblox’s content safety taxonomy—representing 25 subcategories. This evaluation set is created by internal red-teaming, where we test the system by simulating adversarial attacks to look for vulnerabilities, and doesn’t contain user-generated or personal data. This evaluation dataset contains prompt and response pairs with the responses hand-labeled by a set of policy experts to help ensure quality. It spans a wide spectrum of violation types, helping us create more precise and meaningful labels for evaluation. The final evaluation set includes 2,873 examples. This evaluation dataset, which features an extensible safety taxonomy to help benchmark LLM guardrails and moderation systems.
The LLM responses were generated by prompting Llama-3.2-3B-Instruct.
## Citation
If you are using this dataset, please cite it as:
```bibtex
@online{RobloxGuard-1.0,
author = {Mahesh Nandwana and Adam McFarlin and Nishchaie Khanna},
title = {State‑of‑the‑Art LLM Helps Safeguard Unlimited Text Generation on Roblox: Roblox Guard 1.0 — Advancing Safety With Robust Guardrails},
year = {2025},
month = {Jul 22},
howpublished = {\url{https://corp.roblox.com/newsroom/2025/07/roblox-guard-advancing-safety-for-llms-with-robust-guardrails}},
}
<h1 align="center">Roblox Guard-Eval 数据集</h1>
<div align="center" style="line-height: 1;">
<a href="https://huggingface.co/Roblox/Llama-3.1-8B-Instruct-RobloxGuard-1.0" target="_blank"><img alt="Hugging Face 平台" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-RobloxGuard%201.0-ffc107?color=ffc107&logoColor=white"/></a>
<a href="https://github.com/Roblox/RobloxGuard-1.0"><img alt="GitHub 仓库" src="https://img.shields.io/badge/🤖%20Github-RobloxGuard%201.0-ff6b6b?color=1783ff&logoColor=white"/></a>
<a href="https://github.com/Roblox/RobloxGuard-1.0/blob/main/LICENSE"><img src="https://img.shields.io/badge/Model%20License-RAIL_MS-green" alt="模型许可证"/></a>
</div>
<div align="center" style="line-height: 1;">
<a href="https://huggingface.co/datasets/Roblox/RobloxGuard-Eval" target="_blank"><img alt="Hugging Face 数据集平台" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-RobloxGuardEval-ffc107?color=1783ff&logoColor=white"/></a>
<a href="https://creativecommons.org/licenses/by-nc-sa/4.0/"><img src="https://img.shields.io/badge/Data%20License-CC_BY_NC_SA_4.0-blue" alt="数据集许可证"/></a>
</div>
<div align="center" style="line-height: 1;">
<a href="https://corp.roblox.com/newsroom/2025/07/roblox-guard-advancing-safety-for-llms-with-robust-guardrails" target="_blank"><img src="https://img.shields.io/badge/Roblox-Blog-000000.svg?logo=Roblox" height="22px"/></a>
<img src="https://img.shields.io/badge/ArXiv-Report (coming soon)-b5212f.svg?logo=arxiv" height="22px"><sub></sub>
</div>
<p>我们基于Roblox的内容安全分类体系(涵盖25个子分类)构建了定制化高质量评估数据集。本评估集通过内部红队测试构建:我们通过模拟对抗性攻击来测试系统以识别安全漏洞,且数据集未包含用户生成内容或个人隐私数据。本评估数据集包含提示词与回复配对样本,其中所有回复均由政策专家团队手动标注,以保障数据集的专业性与质量。该数据集覆盖全品类违规类型,有助于我们为评估工作构建更精准且具实际参考价值的标注体系。最终的评估集共包含2873条样本。本评估数据集具备可扩展的安全分类体系,可用于大语言模型(Large Language Model,LLM)的安全护栏与审核系统的基准测试。</p>
<p>本次评估中的大语言模型回复均通过提示Llama-3.2-3B-Instruct生成。</p>
<h2>引用方式</h2>
<p>若您使用本数据集,请按以下格式引用:</p>
<pre><code>@online{RobloxGuard-1.0,
author = {Mahesh Nandwana and Adam McFarlin and Nishchaie Khanna},
title = {State‑of‑the‑Art LLM Helps Safeguard Unlimited Text Generation on Roblox: Roblox Guard 1.0 — Advancing Safety With Robust Guardrails},
year = {2025},
month = {Jul 22},
howpublished = {url{https://corp.roblox.com/newsroom/2025/07/roblox-guard-advancing-safety-for-llms-with-robust-guardrails}},
}
</code></pre>
提供机构:
maas
创建时间:
2025-08-15



