Dataset for the "The Schwurbelarchiv: a German Language Telegram dataset for the Study of Conspiracy Theories" paper
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14704627
下载链接
链接失效反馈官方服务:
资源简介:
Dataset for the "The Schwurbelarchiv: a German Language Telegram dataset for the Study of Conspiracy Theories" paper.
For access to the dataset, please contact Prof. Jana Lasser at jana.lasser@uni-graz.at .
The file schwurbelarchiv_full.csv (38.9GB) has 57,288,417 entries and contains the fields:
uid: message UID (string)
group_name: Name of the group or channel (string).
mid_message: Unique ID of the message in the group (int).
mid_file: Unique ID of the file in the group (int).
posting_date: Date and time of the message in the group (string).
fwd_posting_date_message: Date and time of the message in the original group (string).
posting_date_file: Date and time of the file in the group (string).
fwd_posting_date_file: Date and time of the file in the original group (string).
message: Content of the message (string).
fwd_message: Content of the forwarded message (string).
transcribed_message: Transcribed content from speech-containing media files (string).
fwd_transcribed_message: Transcribed content from speech-containing media files in forwarded messages (string).
link_url: URL found in the message, typically in an HTML tag (string).
website: The website associated with the link URL (string).
replied_to: UID of the message being replied to (string).
message_id: Either the mid_message or mid_file (int).
media_file: Name of the attached media file (string).
media_file_type: Type of media file ("voice message", "video", or "photo") (string).
fwd_media_file: Internal path of a forwarded media file (string).
fwd_media_file_type: Type of forwarded media file ("voice message", "video", or "photo") (string).
day: Day of the month (int, 1–31).
day_of_year: Day of the year (int, 1–366).
weekday: Day of the week (int, 1–7).
week: Week of the year (int, 1–52).
month: Month of the year (int, 1–12).
year: Year (int).
message_hash: Hashed value of the message content for network detection (string).
fwd_message_hash: Hashed value of the forwarded message content for network detection (string).
author: Hashed name of the message author (string).
fwd_author: Hashed name of the forwarded message author (string).
lang_message: Language detected in the message (string).
lang_fwd_message: Language detected in the forwarded message (string).
is_unused: Indicates if the message originated from flagged ("dump") groups (boolean).
file_size_mb: Size of the attached media file in the message (float).
fwd_file_size_mb: Size of the attached media file in the forwarded message (float).
duration_sec_audio: Duration of the audio file in the message (float, seconds).
fwd_duration_sec_audio: Duration of the audio file in the forwarded message (float, seconds).
duration_sec_video: Duration of the video file in the message (float, seconds).
fwd_duration_sec_video: Duration of the video file in the forwarded message (float, seconds).
file_exists: Indicates if the referenced media file was found in the data (boolean).
file_error: Indicates if the referenced media file was parseable (boolean).
创建时间:
2025-04-12



