Kurdish Dataset for Fake News Detection (KDFND)
收藏Mendeley Data2024-03-27 更新2024-06-26 收录
下载链接:
https://data.mendeley.com/datasets/grdvwrnkv6
下载链接
链接失效反馈官方服务:
资源简介:
Famous Kurdish news websites, which is officially recognized by the Kurdistan Journal Syndicate and Facebook pages were used to scrape articles. Three separate Kurdistan cities are covered by the public websites. The Kurdistan Regional Government of Iraq's three cities are Erbil, Sulaimani, and Halabja. The news articles are written in Kurdish. This dataset is also notable for being the first and largest in the Kurdish language to concentrate on the Sorani dialect. Over the course of a year, articles were scraped daily from the preset public news sources using Python scripts. The articles were scraped using Facepager, Web Scraper, and Python tools. We eliminated all duplicate articles. Because there was no fact-checked platform in Kurdish at the time, articles from public news sources were also gathered from various news sources and social media pages. Each public news source was divided into two categories based on the annotation criteria for the articles in the dataset: Fake or Real. Each story that was labeled was given a designation based on its public source category, the Kurdistan Journalist Syndicate's guidelines for Kurdish journalism, and various additional criteria based on social media platforms.
创建时间:
2024-01-23
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集是首个专注于库尔德语Sorani方言的假新闻检测数据集,包含从官方新闻网站和社交媒体收集的新闻文章,经过清洗和标记为真实或虚假两类,覆盖库尔德斯坦三个主要城市。
以上内容由遇见数据集搜集并总结生成



