Indonesia Instagram cyberbullying
收藏Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/xthb26ntc5
下载链接
链接失效反馈官方服务:
资源简介:
"All" Dataset (Full Dataset)
This is the primary and most comprehensive dataset containing all text samples (comments). Each text sample has a multi-label label covering all possible categories, including neutral text (not cyberbullying) and various types of cyberbullying (e.g., neutral, flaming, denigration, racism, etc.).
Usage: This dataset is used in two scenarios:
Scenario A (Single-Stage Multi-Label Classification): The "All" dataset is used directly to train a model to classify text into one or more categories simultaneously (e.g., a text can be labeled neutral only, or both flaming and racism simultaneously).
Scenario B - Stage 1 (Binary Detection): This dataset is used to train a binary classification model. For this stage, the original multi-label labels are transformed into binary labels (Yes/No):
The binary label is No (Not Cyberbullying): If the text label is neutral.
Binary label value is Yes (Cyberbullying): If the text contains at least one cyberbullying type label (e.g., flaming, denigration, etc.).
Dataset Cyberbullying (Derived Dataset)
Definition: This is a subset of the "All Dataset." This dataset was created by filtering and sampling only text that was identified as cyberbullying (binary label value: Yes) in Scenario B - Phase 1.
Characteristics: This dataset no longer contains text with a neutral label. It only contains texts guaranteed to contain cyberbullying, along with a multi-label label detailing the type of cyberbullying (e.g., flaming, denigration, etc.).
Usage:
Scenario B - Phase 2 (Cyberbullying Type Classification): This dataset is used exclusively to train the model in the second phase of Scenario B. The goal is to classify the type of cyberbullying from a text, once the text has been confirmed as cyberbullying by the Phase 1 model.
创建时间:
2025-11-11



