five

Cyberbullying dataset for Kurdish Language

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://data.mendeley.com/datasets/ck49jyxcbt
下载链接
链接失效反馈
官方服务:
资源简介:
Cyberbullying has become an increasingly prevalent issue in the digital age, with the rise of social media and online communication. It can take many forms, including verbal attacks, harassment, and discrimination, and it can have serious consequences for victims, including depression, anxiety, and even suicide. While much research has been done on cyberbullying in languages such as English, Spanish, and Chinese, there has been little focus on languages spoken by smaller populations, such as Kurdish. Kurdish is a language spoken by millions of people in the Middle East, including Turkey, Iran, Iraq, and Syria. It is an Indo-European language with several dialects, and it is considered an official language in Iraq and an official regional language in Iran. Despite its widespread use, there has been very little research on cyberbullying in Kurdish, and there are currently no datasets available that specifically focus on this issue. To address this gap, we have created the first ever cyberbullying dataset for the Kurdish language. This dataset contains three classes: neutral, racism, and sexism. The neutral class includes messages that do not contain any form of cyberbullying, while the racism and sexism classes include messages that contain discriminatory language based on race or gender, respectively. The dataset was created using a combination of manual and automated techniques. We collected a large number of messages from Twitter API, that were written in Kurdish. We then manually labeled these messages based on whether they contained cyberbullying or not, and further categorized them into the three classes. The resulting dataset contains over 30,000 messages, with roughly equal distribution among the three classes. It is a valuable resource for researchers and practitioners who are interested in studying cyberbullying in the Kurdish language and developing strategies to combat it. The dataset can be used for a variety of purposes, including training machine learning models to detect cyberbullying in Kurdish, analyzing the language used in cyberbullying messages to identify patterns and trends, and developing interventions to prevent and address cyberbullying in Kurdish-speaking communities.
创建时间:
2025-08-05
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作