akcit-ijf/jigsaw-toxic-comment-train-processed-seqlen128_translated

Name: akcit-ijf/jigsaw-toxic-comment-train-processed-seqlen128_translated
Creator: akcit-ijf
Published: 2024-11-16 18:49:13
License: 暂无描述

Hugging Face2024-11-16 更新2024-12-14 收录

下载链接：

https://hf-mirror.com/datasets/akcit-ijf/jigsaw-toxic-comment-train-processed-seqlen128_translated

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含多条评论，每条评论具有唯一的id和评论内容（comment_text），并且标注了评论是否具有毒性（toxic）、严重毒性（severe_toxic）、淫秽（obscene）、威胁（threat）、侮辱（insult）、身份仇恨（identity_hate）等属性。此外，还包含输入词id（input_word_ids）、输入掩码（input_mask）和所有段id（all_segment_id）等特征。数据集分为训练集，包含223,549个样本，总大小为423,101,877字节。

This dataset contains multiple features, primarily used for analyzing and classifying undesirable attributes in comment text. The features include a unique identifier for each comment (id), the comment text itself (comment_text), and several labels (toxic, severe_toxic, obscene, threat, insult, identity_hate) that indicate whether the comment contains the respective undesirable attributes. Additionally, the dataset includes extra features for model input, such as input_word_ids, input_mask, and all_segment_id. The dataset is divided into a training set, containing 223549 samples.

提供机构：

akcit-ijf

5,000+

优质数据集

54 个

任务类型

进入经典数据集