OIG-moderation

Name: OIG-moderation
Creator: maas
Published: 2025-12-05 16:53:54
License: 暂无描述

魔搭社区2025-12-05 更新2025-12-06 收录

下载链接：

https://modelscope.cn/datasets/ontocord/OIG-moderation

下载链接

链接失效反馈

官方服务：

资源简介：

# This is the Open Instruction Generalist - Moderation Dataset This is our attempt to create a diverse dataset of user dialogue that may be related to NSFW subject matters, abuse eliciting text, privacy violation eliciting instructions, depression or related content, hate speech, and other similar topics. We use the [prosocial], [anthropic redteam], subsets of [English wikipedia] datasets along with other public datasets described below and data created or contributed by volunteers. To regularize the dataset we also have "regular" OIG instructions, which includes Q/A instructions, coding instructions, and similar types of queries. We only have the user prompts and not a potential reply by a bot. Currently there are two versions of the datasets. - OIG_safety_v0.1.jsonl (66200) - OIG_safety_v0.2.jsonl (134530) OIG-moderation includes data from: * The train split of public datasets such as anthropic-redteam and anthropic-harmless, prosocial, and contributed datasets from community members * Augmented toxic data such as civil comments data converted into instructions, the train set for anthropic-redteam data augmented with prosocial tags * Data provided by the LAION community that might include NSFW prompt * Synthetic depression data generated from a public depression bag of words dataset https://huggingface.co/datasets/joangaes/depression using https://huggingface.co/pszemraj/flan-t5-large-grammar-synthesis. * A model trained on the OIG-moderation dataset can be used to provide moderation labels, and the bot providers can choose to then block responses from their chatbots based on these labels. If a bot provider's policy for example permits sexual content, but prohibits PII eliciting text, they can hopefully do so with the output of a model trained on this data. * The tags consist of (a) Base prosocial tags: casual, possibly needs caution, probably needs caution, needs caution, needs intervention and (b) Additional tags: abuse related, personal information related, sexual content, hate. * An utterance can have more than one tag. For example, a wikipedia article about pornography content might be tagged: needs caution | sexual content. ## Models & How To Use [Build custom chatbot applications using OpenChatkit models on Amazon SageMaker](https://aws.amazon.com/blogs/machine-learning/build-custom-chatbot-applications-using-openchatkit-models-on-amazon-sagemaker/) > OpenChatKit has a 6-billion-parameter moderation model, [GPT-JT-Moderation-6B](https://huggingface.co/togethercomputer/GPT-JT-Moderation-6B), which can moderate the chatbot to limit the inputs to the moderated subjects. Although the model itself does have some moderation built in, TogetherComputer trained a GPT-JT-Moderation-6B model with Ontocord.ai’s OIG-moderation dataset. This model runs alongside the main chatbot to check that both the user input and answer from the bot don’t contain inappropriate results. You can also use this to detect any out of domain questions to the chatbot and override when the question is not part of the chatbot’s domain. ## Acknowledgement * We would like to thank all the following people for their amazing contirbutions: @Rallio, @Summer, @Iamiakk @Jue, @yp_yurilee, @Jjmachan, @Coco.han, @Pszemraj, and many others. * We would like to thank Together.xyz for testing the v0.1 data for effectiveness and their dedication to the open source community. * We would like to thank AI Horde and user @Db0 for their incredible contribution of filtered data that were flagged as unethical. ## Disclaimer * These datasets contain synthetic data and in some cases data that includes NSFW subject matter and triggering text such as toxic/offensive/trolling things. If you are concerned about the presence of this type of material in the dataset please make sure you carefully inspect each of the entries and filter appropriately. Our goal is for the model to be as helpful and non-toxic as possible and we are actively evaluating ways to help create models that can detect potentially unwanted or problematic instructions or content. ## Risk Factors * While we acknowledge that this dataset can be modified to train a model to generate unsafe text, it is important to release this publicly as a resource for both researchers and those building production agents to train detection models. ## BY ACCESSING THIS DATASET YOU AGREE YOU ARE 18 YEARS OLD OR OLDER AND UNDERSTAND THE RISKS OF USING THIS DATASET.

# 本数据集为开放指令通用型-审核数据集（Open Instruction Generalist - Moderation Dataset）本项目旨在构建一个涵盖多类敏感对话的多样化用户对话数据集，这类对话可能涉及NSFW（职场不适内容，Not Safe For Work）、诱导攻击性文本、侵犯隐私的指令、抑郁相关内容、仇恨言论及其他类似敏感主题。我们使用了prosocial（亲社会）、anthropic redteam（Anthropic红队测试数据集）、英文维基百科（English Wikipedia）的子集，以及下文提及的其他公开数据集与志愿者创建或贡献的数据。为规范数据集，我们还纳入了“标准OIG指令”，涵盖问答（Q/A）指令、代码编写指令及同类查询。本数据集仅包含用户提示词，不包含机器人的潜在回复。目前该数据集共有两个版本： - OIG_safety_v0.1.jsonl（66200条） - OIG_safety_v0.2.jsonl（134530条） ## OIG-moderation数据集的数据来源包括： * 公开数据集的训练拆分子集，如anthropic-redteam、anthropic-harmless与prosocial数据集，以及社区成员贡献的数据集 * 增强型毒性数据：例如转换为指令格式的civil comments数据集，以及添加了prosocial标签的anthropic-redteam训练集 * LAION社区提供的可能包含NSFW提示词的数据 * 基于公开抑郁词袋数据集https://huggingface.co/datasets/joangaes/depression，结合语法合成模型https://huggingface.co/pszemraj/flan-t5-large-grammar-synthesis生成的合成抑郁相关数据 * 基于本OIG-moderation数据集训练的模型可用于生成审核标签，机器人服务商可据此为其聊天机器人的回复设置拦截策略。例如，若某机器人服务商的政策允许性内容，但禁止诱导获取个人身份信息的文本，即可通过该模型的输出实现对应管控。 * 标签体系分为两类：(a) 基础prosocial标签：无特殊风险（casual）、可能需谨慎（possibly needs caution）、大概率需谨慎（probably needs caution）、需谨慎（needs caution）、需干预（needs intervention）；(b) 额外标签：与滥用相关、涉及个人信息、性内容、仇恨言论。 * 单条发言可被标注多个标签。例如，一篇关于色情内容的维基百科文章可被标注为：需谨慎（needs caution）| 性内容（sexual content）。 ## 模型与使用方法 [在亚马逊SageMaker（Amazon SageMaker）上使用OpenChatkit模型构建自定义聊天机器人应用](https://aws.amazon.com/blogs/machine-learning/build-custom-chatbot-applications-using-openchatkit-models-on-amazon-sagemaker/) > OpenChatKit 搭载了一款参数量为60亿的审核模型[GPT-JT-Moderation-6B](https://huggingface.co/togethercomputer/GPT-JT-Moderation-6B)，可对聊天机器人的输入内容进行审核，过滤敏感主题相关的输入。尽管该模型本身已内置部分审核逻辑，但TogetherComputer仍基于Ontocord.ai的OIG-moderation数据集训练了GPT-JT-Moderation-6B模型。该模型可与主聊天机器人协同运行，用于校验用户输入与机器人回复是否包含不当内容。此外，你还可使用该模型检测超出聊天机器人领域范围的问题，并在问题不属于机器人服务范畴时进行拦截。 ## 致谢 * 感谢@Rallio、@Summer、@Iamiakk、@Jue、@yp_yurilee、@Jjmachan、@Coco.han、@Pszemraj 以及其他众多贡献者的出色工作。 * 感谢Together.xyz对v0.1版本数据的有效性测试，以及其对开源社区的投入。 * 感谢AI Horde与用户@Db0 贡献了被标记为不道德的过滤后数据，其贡献极具价值。 ## 免责声明 * 本数据集包含合成数据，部分内容涉及NSFW主题与刺激性文本，如毒性、冒犯性或挑衅性内容。若您担忧数据集中存在此类材料，请务必仔细检查每条数据并进行适当过滤。我们的目标是让模型尽可能实用且无毒性，目前正在积极探索可行方案，以助力构建可检测潜在有害或问题指令与内容的模型。 ## 风险因素 * 尽管我们承认本数据集可被用于训练生成不安全文本的模型，但将其公开发布仍具有重要价值：可为研究人员与构建生产环境AI智能体（AI Agent）的开发者提供训练检测模型的资源。 ## 访问声明 > 若您访问本数据集，即表示您确认已年满18周岁，并理解使用本数据集的相关风险。

提供机构：

maas

创建时间：

2025-10-11

5,000+

优质数据集

54 个

任务类型

进入经典数据集