five

berkatil/TRuST

收藏
Hugging Face2026-04-13 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/berkatil/TRuST
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-nc-4.0 configs: - config_name: human_annotated data_files: - split: train path: human_annotated/train_final.csv - split: validation path: human_annotated/val_final.csv - split: test path: human_annotated/test_final.csv - config_name: machine_labeled data_files: - split: train path: machine_labeled/machine_labeled_mapped.csv dataset_info: - config_name: human_annotated features: - name: index dtype: int64 - name: text dtype: string - name: Toxicity dtype: string - name: Target dtype: string - name: Toxic Span dtype: string - config_name: machine_labeled features: - name: text dtype: string - name: prob_non_toxic dtype: float64 - name: prob_toxic dtype: float64 - name: spanBert_span_preds dtype: string - name: Bert_target_pred dtype: string - name: Bert_higher_target_pred dtype: string --- # Dataset Card for TRuST ### Dataset Summary This dataset is for toxicity detection, that covers hate speech, offensive language, profanity etc. It merges existing dataset, reannotate and unify to have toxicity, target social group and toxic span labels. ### Languages All text is written in English. ### Citation Information To Appear at ACL2026 - **Paper:** https://arxiv.org/abs/2506.02326 - **Point of Contact:** [Berk Atil](atilberk98@gmail.com) ````@misc{atil2026justliketrust, title={Something Just Like TRuST : Toxicity Recognition of Span and Target}, author={Berk Atil and Namrata Sureddy and Rebecca J. Passonneau}, year={2026}, eprint={2506.02326}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2506.02326}, } ````

license: CC-BY-NC-4.0 configs: - config_name: 人工标注 (human_annotated) data_files: - split: 训练集 (train) path: human_annotated/train_final.csv - split: 验证集 (validation) path: human_annotated/val_final.csv - split: 测试集 (test) path: human_annotated/test_final.csv - config_name: 机器标注 (machine_labeled) data_files: - split: 训练集 (train) path: machine_labeled/machine_labeled_mapped.csv dataset_info: - config_name: 人工标注 (human_annotated) features: - name: 索引 (index) dtype: 64位整型 - name: 文本 (text) dtype: 字符串 - name: 毒性标签 (Toxicity) dtype: 字符串 - name: 目标群体 (Target) dtype: 字符串 - name: 毒性跨度 (Toxic Span) dtype: 字符串 - config_name: 机器标注 (machine_labeled) features: - name: 文本 (text) dtype: 字符串 - name: 非毒性概率 (prob_non_toxic) dtype: 64位浮点型 - name: 毒性概率 (prob_toxic) dtype: 64位浮点型 - name: SpanBert 毒性跨度预测结果 (spanBert_span_preds) dtype: 字符串 - name: Bert 目标预测结果 (Bert_target_pred) dtype: 字符串 - name: Bert 高阶目标预测结果 (Bert_higher_target_pred) dtype: 字符串 --- ## TRuST 数据集卡片 ### 数据集概述 本数据集面向毒性检测任务,涵盖仇恨言论、冒犯性语言、亵渎性用语等场景。该数据集整合现有公开数据集,经重新标注与统一规范后,包含毒性标签、目标社交群体以及毒性跨度 (Toxic Span) 三类核心标注信息。 ### 语言说明 所有文本均采用英文撰写。 ### 引用信息 本论文将于ACL 2026会议发表: - **论文链接**:https://arxiv.org/abs/2506.02326 - **联系人**:[Berk Atil](atilberk98@gmail.com) @misc{atil2026justliketrust, title={Something Just Like TRuST : Toxicity Recognition of Span and Target}, author={Berk Atil and Namrata Sureddy and Rebecca J. Passonneau}, year={2026}, eprint={2506.02326}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2506.02326}, }
提供机构:
berkatil
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作