TRACES Bulgarian Telegram Dataset Annotated with Linguistic Markers of Lies
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/7614293
下载链接
链接失效反馈官方服务:
资源简介:
This dataset has been created within Project TRACES (more information: https://traces.gate-ai.eu/). The dataset contains 8791 anonymized Telegram social media posts, written in Bulgarian. The dataset is annotated with general information (named entities, part-of-speech tags, sentence length, etc.) and specific markers signaling details and can be used for general purposes or for building lies, manipulation, and disinformation detection applications.
Note: this dataset is not fact-checked, the social media messages have been retrieved via keywords. For fact-checked datasets, see our other datasets.
The social media posts have been collected via Telegram Desktop in June-July 2022.
Explanations of which fields can be used as markers of lies (or of intentional disinformation) are provided in our forthcoming paper:
Irina Temnikova, Silvia Gargova, Ruslana Margova, Veneta Kireva, Ivo Dzhumerov, Tsvetelina Stefanova and Hristiana Nikolaeva (2023) New Bulgarian Resources for Detecting Disinformation. 10th Language and Technology
Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics (LTC'23). Poznań. Poland.
创建时间:
2024-12-03



