Toxic Question Dataset

Name: Toxic Question Dataset
Creator: Third-party security company
License: 暂无描述

arXiv2025-09-30 收录

下载链接：

https://github.com/chengzelei/crowdsource_toxicity_classification

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集根据OpenAI的使用政策，被划分为15个类别。每个数据点都包含了三个由人类标注的注释和三个由大型语言模型生成的注释。此外，该数据集不仅包含了来自人类标注者的注释，还包括了来自GPT-4、GPT-4 Turbo和Claude-2等大型语言模型的注释。在规模上，该数据集包含了6941个样本用于训练，2000个样本用于测试，以及1000个样本用于验证。其任务是进行问题内容的毒性分类。

This dataset is categorized into 15 categories in accordance with OpenAI’s usage policies. Each data point contains three human annotations and three annotations generated by Large Language Models (LLMs). Additionally, the dataset includes annotations not only from human annotators but also from leading large language models such as GPT-4, GPT-4 Turbo and Claude-2. In terms of scale, the dataset comprises 6941 training samples, 2000 test samples and 1000 validation samples. Its task is to conduct toxicity classification on the content of input questions.

提供机构：

Third-party security company

5,000+

优质数据集

54 个

任务类型

进入经典数据集