five

Data Sheet 1_Advancing cyberbullying detection in low-resource languages: a transformer- stacking framework for Bengali.pdf

收藏
NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://figshare.com/articles/dataset/Data_Sheet_1_Advancing_cyberbullying_detection_in_low-resource_languages_a_transformer-_stacking_framework_for_Bengali_pdf/31186729
下载链接
链接失效反馈
官方服务:
资源简介:
Cyberbullying on social networks has emerged as a pressing global issue, yet research in low-resource languages such as Bengali remains underdeveloped due to the scarcity of high-quality datasets, linguistic resources, and targeted methodologies. Many existing approaches overlook essential language-specific preprocessing, neglect the integration of advanced transformer-based models, and do not adequately address model validation, scalability, and adaptability. To address these limitations, this study introduces three Bengali-specific preprocessing strategies to enhance feature representation. It then proposes Transformer-stacking, an effective hybrid detection framework that combines three transformer models, XLM-R-base, multilingual BERT, and Bangla-Bert-Base, via a stacking strategy with a multi-layer perceptron classifier. The framework is evaluated on a publicly available Bengali cyberbullying dataset comprising 44,001 samples across both binary (Sub-task A) and multiclass (Sub-task B) classification settings. Transformer-stacking achieves an F1-score of 93.61% and an accuracy of 93.62% for Sub-task A, and an F1-score and accuracy of 89.23% for Sub-task B, outperforming eight baseline transformer models, four transformer ensemble techniques, and recent state-of-the-art methods. These improvements are statistically validated using McNemar's test. Furthermore, experiments on two external Bengali datasets, focused on hate speech and abusive language, demonstrate the model's scalability and adaptability. Overall, Transformer-stacking offers an effective and generalizable solution for Bengali cyberbullying detection, establishing a new benchmark in this underexplored domain.
创建时间:
2026-01-29
二维码
社区交流群
二维码
科研交流群
商业服务