Blast2k3/jigsaw-toxic-comment-classification-challenge

Name: Blast2k3/jigsaw-toxic-comment-classification-challenge
Creator: Blast2k3
Published: 2025-12-19 06:27:29
License: 暂无描述

Hugging Face2025-12-19 更新2025-12-20 收录

下载链接：

https://hf-mirror.com/datasets/Blast2k3/jigsaw-toxic-comment-classification-challenge

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集提供了大量被人类评分者标记为有毒行为的维基百科评论。毒性的类型包括：toxic（有毒）、severe_toxic（严重有毒）、obscene（淫秽）、threat（威胁）、insult（侮辱）和identity_hate（身份仇恨）。任务是创建一个模型，预测每条评论每种毒性的概率。文件描述包括：train.csv（训练集，包含评论及其二进制标签）、test.csv（测试集，需预测这些评论的毒性概率，其中部分评论不计入评分）、sample_submission.csv（正确格式的示例提交文件）和test_labels.csv（测试数据的标签，值为-1表示未用于评分）。数据集使用CC0许可，评论文本受Wikipedia的CC-SA-3.0许可管辖。

You are provided with a large number of Wikipedia comments which have been labeled by human raters for toxic behavior. The types of toxicity are: toxic, severe_toxic, obscene, threat, insult, and identity_hate. You must create a model which predicts a probability of each type of toxicity for each comment. File descriptions include: train.csv (the training set, contains comments with their binary labels), test.csv (the test set, you must predict the toxicity probabilities for these comments, some of which are not included in scoring), sample_submission.csv (a sample submission file in the correct format), and test_labels.csv (labels for the test data; value of -1 indicates it was not used for scoring). The dataset is under CC0, with the underlying comment text governed by Wikipedias CC-SA-3.0.

提供机构：

Blast2k3

5,000+

优质数据集

54 个

任务类型

进入经典数据集