toxic-chat

Name: toxic-chat
Creator: maas
Published: 2026-05-16 21:39:48
License: 暂无描述

魔搭社区2026-05-16 更新2025-11-22 收录

下载链接：

https://modelscope.cn/datasets/lmsys/toxic-chat

下载链接

链接失效反馈

官方服务：

资源简介：

## Update [01/31/2024] We update the OpenAI Moderation API results for ToxicChat (0124) based on their updated moderation model on on Jan 25, 2024. [01/28/2024] We release an official [T5-Large model](https://huggingface.co/lmsys/toxicchat-t5-large-v1.0) trained on ToxicChat (toxicchat0124). Go and check it for you baseline comparision! [01/19/2024] We have a new version of ToxicChat (toxicchat0124)! ## Content This dataset contains toxicity annotations on 10K user prompts collected from the Vicuna [online demo](https://chat.lmsys.org/). We utilize a human-AI collaborative annotation framework to guarantee the quality of annotation while maintaining a feasible annotation workload. The details of data collection, pre-processing, and annotation can be found in our [paper](https://arxiv.org/abs/2310.17389). We believe that ToxicChat can be a valuable resource to drive further advancements toward building a safe and healthy environment for user-AI interactions. ## Version The version name is the update time of the dataset, e.g, 0124 means it is updated on Jan, 2024. We recommend using the latest version for training and evaluating a model. Please make sure the version of the data is the same when comparing different models. You can use the following code to specify the dataset version: ```python from datasets import load_dataset dataset = load_dataset("lmsys/toxic-chat", "toxicchat0124") ``` - **toxicchat0124** Based on version 1123, we did a model error analysis to check if there are any annotation errors and later fixed them. Each fix was checked by two annotators. The total label difference is 1.28% for toxicity labels and 0.34% for jailbreaking labels. We finally add 20 more human annotated examples which are not annotated in version 1123. - **toxicchat1123:** The initial version. **Basic Statistics** | Version | 1123 | 0124 | | --- | --- | --- | | # User Prompts | 10,165 | 10,165 | | # Human Annotations | 5,634 | 5,654 | | % Toxic Examples | 7.18% | 7.33% | | % Jailbreaking Examples | 1.78% | 2.01% | ## Model We finetuned a [T5-large model](https://huggingface.co/lmsys/toxicchat-t5-large-v1.0) on ToxicChat (version 0124), and you can use it as a baseline model for comparision. Note to have the same version of data. | Model | Precision | Recall | F1 | AUPRC | | --- | --- | --- | --- | --- | | ToxicChat-T5-large | 0.7983 | 0.8475 | 0.8221 | 0.8850 | | OpenAI Moderation (Updated Jan 25, 2024, threshold=0.02) | 0.5476 | 0.6989 | 0.6141 | 0.6313 | ## Disclaimer and Terms - This dataset is based on the user query collected from the Vicuna online demo. The Vicuna demo is fully anonymous for the users and also highlights the possible reuse of the user query data. We have carefully gone through the data and taken out anything that could have personal information in it. However, there is still a chance that some personal information might be left in the data. If you come across anything in the data that you think should not be made public, please let us know right away. - Safety and Moderation: **This dataset may contain racism, sexuality, or other undesired content.** Before the annotation, the annotators are first notified about the toxic data that they will be annotated. Verbal agreements were obtained before annotation. - Non-Endorsement: Statements or opinions made in this dataset **do not reflect** the views of researchers or institutions involved in the data collection effort. - Legal Compliance: Users of this data are responsible for ensuring its appropriate use. The dataset should not be utilized for training dialogue agents, or any other applications, in manners that conflict with legal and ethical standards. - Non-Identification: Users of this data agree to not attempt to determine the identity of individuals in this dataset. ## License Both the user prompts and the model outputs are licensed under CC-BY-NC-4.0. ## Citation ``` @misc{lin2023toxicchat, title={ToxicChat: Unveiling Hidden Challenges of Toxicity Detection in Real-World User-AI Conversation}, author={Zi Lin and Zihan Wang and Yongqi Tong and Yangkun Wang and Yuxin Guo and Yujia Wang and Jingbo Shang}, year={2023}, eprint={2310.17389}, archivePrefix={arXiv}, primaryClass={cs.CL} } ```

【更新】 [2024年1月31日] 我们基于OpenAI于2024年1月25日更新的审核模型，更新了ToxicChat（版本0124）的OpenAI审核API标注结果。 [2024年1月28日] 我们发布了基于ToxicChat（toxicchat0124）训练的官方[T5-Large模型（T5-Large）](https://huggingface.co/lmsys/toxicchat-t5-large-v1.0)，可作为基线模型用于对比实验！ [2024年1月19日] 我们推出了ToxicChat的新版本（toxicchat0124）！【数据集内容】本数据集包含从Vicuna[在线演示平台（online demo）](https://chat.lmsys.org/)收集的1万条用户提示（user prompt）的毒性标注。我们采用人机协同标注框架，在保障标注质量的同时，控制了合理的标注工作量。数据收集、预处理及标注的详细流程可参见我们的[论文（paper）](https://arxiv.org/abs/2310.17389)。我们认为，ToxicChat可作为一项宝贵资源，推动构建安全健康的用户-AI交互环境相关研究的进一步发展。【版本说明】数据集版本名称以其更新时间命名，例如0124代表该版本于2024年1月更新。我们建议使用最新版本进行模型训练与评估，且在对比不同模型时，请确保使用相同版本的数据集。你可以通过以下代码指定数据集版本： python from datasets import load_dataset dataset = load_dataset("lmsys/toxic-chat", "toxicchat0124") - **toxicchat0124**：基于1123版本，我们开展了模型误差分析以排查标注错误并进行修正，每一处修正均由两名标注人员复核。毒性标签的总标注差异率为1.28%，越狱（jailbreaking）标签的总标注差异率为0.34%。最终我们新增了20条1123版本中未标注的人工标注样本。 - **toxicchat1123**：初始版本。 **基础统计数据** | 版本 | 1123 | 0124 | | --- | --- | --- | | 用户提示数（# User Prompts） | 10165 | 10165 | | 人工标注数（# Human Annotations） | 5634 | 5654 | | 毒性样本占比（% Toxic Examples） | 7.18% | 7.33% | | 越狱样本占比（% Jailbreaking Examples） | 1.78% | 2.01% | 【模型】我们基于ToxicChat（0124版本）微调了[T5-large模型（T5-large）](https://huggingface.co/lmsys/toxicchat-t5-large-v1.0)，可将其作为对比实验的基线模型。请注意需使用对应版本的数据集。 | 模型 | 精确率（Precision） | 召回率（Recall） | F1值 | 精确率-召回率曲线下面积（AUPRC） | | --- | --- | --- | --- | --- | | ToxicChat-T5-large | 0.7983 | 0.8475 | 0.8221 | 0.8850 | | OpenAI审核工具（OpenAI Moderation，2024年1月25日更新，阈值=0.02） | 0.5476 | 0.6989 | 0.6141 | 0.6313 | 【免责声明与使用条款】 - 本数据集基于从Vicuna在线演示平台收集的用户查询。Vicuna演示平台对用户完全匿名，并明确说明用户查询数据可被复用。我们已对数据进行了仔细排查，移除了所有包含个人信息的内容，但仍存在少量个人信息残留的可能性。若您发现数据中存在不应公开的内容，请立即告知我们。 - 安全与审核提示：**本数据集可能包含种族主义、性相关或其他不良内容。** 标注开始前，我们已向标注人员告知其将处理的毒性数据内容，并在标注前获取了标注人员的口头同意。 - 不代表官方立场：本数据集中的陈述或观点**不代表**参与数据收集工作的研究人员或机构的立场。 - 合规使用：数据集使用者需确保其使用方式合规。本数据集不得用于训练对话智能体（AI Agent）或其他违反法律与伦理标准的应用场景。 - 禁止识别身份：数据集使用者同意不会尝试识别本数据集中个体的身份。【许可证】本数据集的用户提示与模型输出均采用CC-BY-NC-4.0许可证进行授权。【引用格式】 @misc{lin2023toxicchat, title={ToxicChat：揭示现实场景中用户-AI对话毒性检测的隐藏挑战}, author={Zi Lin and Zihan Wang and Yongqi Tong and Yangkun Wang and Yuxin Guo and Yujia Wang and Jingbo Shang}, year={2023}, eprint={2310.17389}, archivePrefix={arXiv}, primaryClass={cs.CL} }

提供机构：

maas

创建时间：

2025-11-19

搜集汇总

数据集介绍