berkatil/TRuST
收藏Hugging Face2026-04-13 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/berkatil/TRuST
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-nc-4.0
configs:
- config_name: human_annotated
data_files:
- split: train
path: human_annotated/train_final.csv
- split: validation
path: human_annotated/val_final.csv
- split: test
path: human_annotated/test_final.csv
- config_name: machine_labeled
data_files:
- split: train
path: machine_labeled/machine_labeled_mapped.csv
dataset_info:
- config_name: human_annotated
features:
- name: index
dtype: int64
- name: text
dtype: string
- name: Toxicity
dtype: string
- name: Target
dtype: string
- name: Toxic Span
dtype: string
- config_name: machine_labeled
features:
- name: text
dtype: string
- name: prob_non_toxic
dtype: float64
- name: prob_toxic
dtype: float64
- name: spanBert_span_preds
dtype: string
- name: Bert_target_pred
dtype: string
- name: Bert_higher_target_pred
dtype: string
---
# Dataset Card for TRuST
### Dataset Summary
This dataset is for toxicity detection, that covers hate speech, offensive language, profanity etc. It merges existing dataset, reannotate and unify to have toxicity, target social group and toxic span labels.
### Languages
All text is written in English.
### Citation Information
To Appear at ACL2026
- **Paper:** https://arxiv.org/abs/2506.02326
- **Point of Contact:** [Berk Atil](atilberk98@gmail.com)
````@misc{atil2026justliketrust,
title={Something Just Like TRuST : Toxicity Recognition of Span and Target},
author={Berk Atil and Namrata Sureddy and Rebecca J. Passonneau},
year={2026},
eprint={2506.02326},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2506.02326},
}
````
license: CC-BY-NC-4.0
configs:
- config_name: 人工标注 (human_annotated)
data_files:
- split: 训练集 (train)
path: human_annotated/train_final.csv
- split: 验证集 (validation)
path: human_annotated/val_final.csv
- split: 测试集 (test)
path: human_annotated/test_final.csv
- config_name: 机器标注 (machine_labeled)
data_files:
- split: 训练集 (train)
path: machine_labeled/machine_labeled_mapped.csv
dataset_info:
- config_name: 人工标注 (human_annotated)
features:
- name: 索引 (index)
dtype: 64位整型
- name: 文本 (text)
dtype: 字符串
- name: 毒性标签 (Toxicity)
dtype: 字符串
- name: 目标群体 (Target)
dtype: 字符串
- name: 毒性跨度 (Toxic Span)
dtype: 字符串
- config_name: 机器标注 (machine_labeled)
features:
- name: 文本 (text)
dtype: 字符串
- name: 非毒性概率 (prob_non_toxic)
dtype: 64位浮点型
- name: 毒性概率 (prob_toxic)
dtype: 64位浮点型
- name: SpanBert 毒性跨度预测结果 (spanBert_span_preds)
dtype: 字符串
- name: Bert 目标预测结果 (Bert_target_pred)
dtype: 字符串
- name: Bert 高阶目标预测结果 (Bert_higher_target_pred)
dtype: 字符串
---
## TRuST 数据集卡片
### 数据集概述
本数据集面向毒性检测任务,涵盖仇恨言论、冒犯性语言、亵渎性用语等场景。该数据集整合现有公开数据集,经重新标注与统一规范后,包含毒性标签、目标社交群体以及毒性跨度 (Toxic Span) 三类核心标注信息。
### 语言说明
所有文本均采用英文撰写。
### 引用信息
本论文将于ACL 2026会议发表:
- **论文链接**:https://arxiv.org/abs/2506.02326
- **联系人**:[Berk Atil](atilberk98@gmail.com)
@misc{atil2026justliketrust,
title={Something Just Like TRuST : Toxicity Recognition of Span and Target},
author={Berk Atil and Namrata Sureddy and Rebecca J. Passonneau},
year={2026},
eprint={2506.02326},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2506.02326},
}
提供机构:
berkatil



