CAD
收藏arXiv2025-09-30 收录
下载链接:
https://zenodo.org/record/4881008
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为CAD,包含了大约25,000条Reddit帖子,具有27,494个独特的标签。每个实例都使用一种分层的滥用类别分类法被标记为不同的滥用类别。此外,这部分数据还标注了偏见理由,并且为了创建适用于不同任务的子集,应用了特定的筛选标准。该数据集的规模大约为25,000条条目,涉及的任务包括诊断、识别、提取和改写。
This dataset, named CAD, contains approximately 25,000 Reddit posts and features 27,494 unique tags. Each instance is labeled with distinct abuse categories via a hierarchical taxonomy of abuse classes. Furthermore, the dataset is also annotated with bias rationales, and specific filtering criteria have been applied to create subsets tailored for various tasks. With approximately 25,000 entries in total, this dataset supports tasks including diagnosis, identification, extraction, and paraphrasing.



