five

TRACES Bulgarian Twitter Dataset on Lies and Manipulation Annotated with Linguistic Markers of Lies

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/7614317
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset has been created within Project TRACES (more information: https://traces.gate-ai.eu/). The dataset is in .csv format and contains 32518 tweet IDs of tweets, written in Bulgarian, with annotations. Note: this dataset is not fact-checked, the social media messages have been retrieved via keywords. For fact-checked datasets, see our other datasets. The dataset can be used for general purposes or for building lies and disinformation detection applications (by using the annotations with the linguistic markers of lies).  The tweets (written between 1 Jan 2020-27 June 2022) have been collected via Twitter API under academic access in June-July 2022 with the following keywords: (лъжа OR лъжи OR лицемерие OR лъжат OR излъга OR измама OR измамници OR измами OR лъжец OR лъжци)  (фалшиви OR fakenews OR невярно OR неверни OR подвеждащи OR подвеждащо OR неистини) - without retweets (манипулация OR манипулира OR стъкмистика OR крие OR далавераджия OR далавери OR далавера) - without retweets Explanation of which fields can be used as markers of lies (or of intentional disinformation) are provided in our forthcoming paper: Irina Temnikova, Silvia Gargova, Ruslana Margova, Veneta Kireva, Ivo Dzhumerov, Tsvetelina Stefanova and Hristiana Nikolaeva (2023) New Bulgarian Resources for Detecting Disinformation. 10th Language and Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics (LTC'23). Poznań. Poland.
创建时间:
2024-12-03
二维码
社区交流群
二维码
科研交流群
商业服务