Mirenda/jigsaw-toxic-comment-classification-challenge

Name: Mirenda/jigsaw-toxic-comment-classification-challenge
Creator: Mirenda
Published: 2026-04-14 23:31:20
License: 暂无描述

Hugging Face2026-04-14 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/Mirenda/jigsaw-toxic-comment-classification-challenge

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-sa-3.0 --- ## Dataset Description You are provided with a large number of Wikipedia comments which have been labeled by human raters for toxic behavior. The types of toxicity are: - `toxic` - `severe_toxic` - `obscene` - `threat` - `insult` - `identity_hate` You must create a model which predicts a probability of each type of toxicity for each comment. ### File descriptions - **train.csv** - the training set, contains comments with their binary labels - **test.csv** - the test set, you must predict the toxicity probabilities for these comments. To deter hand labeling, the test set contains some comments which are not included in scoring. - **sample_submission.csv** - a sample submission file in the correct format - **test_labels.csv** - labels for the test data; value of -1 indicates it was not used for scoring; (Note: file added after competition close!) ### Usage The dataset under [CC0](https://creativecommons.org/public-domain/cc0/), with the underlying comment text being governed by [Wikipedia's CC-SA-3.0](https://creativecommons.org/licenses/by-sa/3.0/) ## License Redistributed by [@thesofakillers](https://github.com/thesofakillers) on huggingface as permitted under the CC0 license. The comment text in this dataset is sourced from Wikipedia articles, available under the [Creative Commons Attribution-ShareAlike 3.0](https://creativecommons.org/licenses/by-sa/3.0/) license.

提供机构：

Mirenda

5,000+

优质数据集

54 个

任务类型

进入经典数据集