TOCP

arXiv2025-09-30 收录

下载链接：

https://www.ptt.cc/bbs/index.html

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是为了检测和改写中文侮辱性语言而创建的，包含来自PTT论坛的超过16,000个句子和17,000个侮辱性表达。需要注意的是，由于文化差异，这个数据集可能无法完全代表中国大陆地区的侮辱性语言使用情况。规模上，该数据集拥有超过16,000个句子，其任务旨在进行侮辱性语言的检测和改写。

This dataset was created for the detection and paraphrasing of Chinese offensive language, containing over 16,000 sentences and 17,000 offensive expressions sourced from the PTT Forum. It should be noted that due to cultural differences, this dataset may not fully represent the usage of offensive language in mainland China. In terms of scale, this dataset has over 16,000 sentences, and its task aims to detect and paraphrase offensive language.

提供机构：

PTT

搜集汇总

背景与挑战

背景概述

TOCP是一个专注于中文侮辱性语言检测和改写的数据集，包含来自PTT论坛的16,000多个句子和17,000多个侮辱性表达，但需注意其可能不完全适用于中国大陆的语言文化场景。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集