five

Mirenda/jigsaw-toxic-comment-classification-challenge

收藏
Hugging Face2026-04-14 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/Mirenda/jigsaw-toxic-comment-classification-challenge
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-sa-3.0 --- ## Dataset Description You are provided with a large number of Wikipedia comments which have been labeled by human raters for toxic behavior. The types of toxicity are: - `toxic` - `severe_toxic` - `obscene` - `threat` - `insult` - `identity_hate` You must create a model which predicts a probability of each type of toxicity for each comment. ### File descriptions - **train.csv** - the training set, contains comments with their binary labels - **test.csv** - the test set, you must predict the toxicity probabilities for these comments. To deter hand labeling, the test set contains some comments which are not included in scoring. - **sample_submission.csv** - a sample submission file in the correct format - **test_labels.csv** - labels for the test data; value of -1 indicates it was not used for scoring; (Note: file added after competition close!) ### Usage The dataset under [CC0](https://creativecommons.org/public-domain/cc0/), with the underlying comment text being governed by [Wikipedia's CC-SA-3.0](https://creativecommons.org/licenses/by-sa/3.0/) ## License Redistributed by [@thesofakillers](https://github.com/thesofakillers) on huggingface as permitted under the CC0 license. The comment text in this dataset is sourced from Wikipedia articles, available under the [Creative Commons Attribution-ShareAlike 3.0](https://creativecommons.org/licenses/by-sa/3.0/) license.
提供机构:
Mirenda
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作