FearedFusionX/RobloxGuard-Eval

Name: FearedFusionX/RobloxGuard-Eval
Creator: FearedFusionX
Published: 2026-03-20 08:56:11
License: 暂无描述

Hugging Face2026-03-20 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/FearedFusionX/RobloxGuard-Eval

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: prompt dtype: string - name: response dtype: string - name: violation dtype: string - name: category dtype: string splits: - name: test num_bytes: 3187933 num_examples: 2873 download_size: 1755920 dataset_size: 3187933 configs: - config_name: default data_files: - split: test path: data/test-* license: cc-by-nc-4.0 task_categories: - text-classification language: - en size_categories: - 1K<n<10K tags: - safety - content moderation - LLM safety - toxicity detection --- <h1 align="center">Roblox Guard-Eval Dataset</h1> <div align="center" style="line-height: 1;"> <a href="https://huggingface.co/Roblox/Llama-3.1-8B-Instruct-RobloxGuard-1.0" target="_blank"><img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-RobloxGuard 1.0-ffc107?color=ffc107&logoColor=white"/></a> <a href="https://github.com/Roblox/RobloxGuard-1.0"><img alt="github" src="https://img.shields.io/badge/🤖%20Github-RobloxGuard%201.0-ff6b6b?color=1783ff&logoColor=white"/></a> <a href="https://github.com/Roblox/RobloxGuard-1.0/blob/main/LICENSE"><img src="https://img.shields.io/badge/Model%20License-RAIL_MS-green" alt="Model License"></a> </div> <div align="center" style="line-height: 1;"> <a href="https://huggingface.co/datasets/Roblox/RobloxGuard-Eval" target="_blank"><img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-RobloxGuardEval-ffc107?color=1783ff&logoColor=white"/></a> <a href="https://creativecommons.org/licenses/by-nc-sa/4.0/"><img src="https://img.shields.io/badge/Data%20License-CC_BY_NC_SA_4.0-blue" alt="Data License"></a> </div> <div align="center" style="line-height: 1;"> <a href="https://corp.roblox.com/newsroom/2025/07/roblox-guard-advancing-safety-for-llms-with-robust-guardrails" target="_blank"><img src=https://img.shields.io/badge/Roblox-Blog-000000.svg?logo=Roblox height=22px></a> <a href="https://arxiv.org/abs/2512.05339" target="_blank"><img src="https://img.shields.io/badge/Paper-2512.05339-b5212f.svg?logo=arxiv" height="22px"></a><sub></sub> </div> We developed a custom high-quality evaluation dataset across Roblox’s content safety taxonomy—representing 25 subcategories. This evaluation set is created by internal red-teaming, where we test the system by simulating adversarial attacks to look for vulnerabilities, and doesn’t contain user-generated or personal data. This evaluation dataset contains prompt and response pairs with the responses hand-labeled by a set of policy experts to help ensure quality. It spans a wide spectrum of violation types, helping us create more precise and meaningful labels for evaluation. The final evaluation set includes 2,873 examples. This evaluation dataset, which features an extensible safety taxonomy to help benchmark LLM guardrails and moderation systems. The LLM responses were generated by prompting Llama-3.2-3B-Instruct. ## Citation If you are using this dataset, please cite it as: ```bibtex @article{nandwana2025taxonomy, title={Taxonomy-Adaptive Moderation Model with Robust Guardrails for Large Language Models}, author={Nandwana, Mahesh Kumar and Lim, Youngwan and Liu, Joseph and Yang, Alex and Notibala, Varun and Khanna, Nishchaie}, journal={arXiv preprint arXiv:2512.05339}, year={2025} }

提供机构：

FearedFusionX

5,000+

优质数据集

54 个

任务类型

进入经典数据集