five

Kurdish Dataset for Fake News Detection (KDFND)

收藏
Mendeley Data2024-03-27 更新2024-06-26 收录
下载链接:
https://data.mendeley.com/datasets/grdvwrnkv6
下载链接
链接失效反馈
官方服务:
资源简介:
Famous Kurdish news websites, which is officially recognized by the Kurdistan Journal Syndicate and Facebook pages were used to scrape articles. Three separate Kurdistan cities are covered by the public websites. The Kurdistan Regional Government of Iraq's three cities are Erbil, Sulaimani, and Halabja. The news articles are written in Kurdish. This dataset is also notable for being the first and largest in the Kurdish language to concentrate on the Sorani dialect. Over the course of a year, articles were scraped daily from the preset public news sources using Python scripts. The articles were scraped using Facepager, Web Scraper, and Python tools. We eliminated all duplicate articles. Because there was no fact-checked platform in Kurdish at the time, articles from public news sources were also gathered from various news sources and social media pages. Each public news source was divided into two categories based on the annotation criteria for the articles in the dataset: Fake or Real. Each story that was labeled was given a designation based on its public source category, the Kurdistan Journalist Syndicate's guidelines for Kurdish journalism, and various additional criteria based on social media platforms.
创建时间:
2024-01-23
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
该数据集是首个专注于库尔德语Sorani方言的假新闻检测数据集,包含从官方新闻网站和社交媒体收集的新闻文章,经过清洗和标记为真实或虚假两类,覆盖库尔德斯坦三个主要城市。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作