five

Dataset for the "The Schwurbelarchiv: a German Language Telegram dataset for the Study of Conspiracy Theories" paper

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14704627
下载链接
链接失效反馈
官方服务:
资源简介:
Dataset for the "The Schwurbelarchiv: a German Language Telegram dataset for the Study of Conspiracy Theories" paper.   For access to the dataset, please contact Prof. Jana Lasser at jana.lasser@uni-graz.at .    The file schwurbelarchiv_full.csv (38.9GB) has 57,288,417 entries and contains the fields: uid: message UID (string) group_name: Name of the group or channel (string). mid_message: Unique ID of the message in the group (int). mid_file: Unique ID of the file in the group (int). posting_date: Date and time of the message in the group (string). fwd_posting_date_message: Date and time of the message in the original group (string). posting_date_file: Date and time of the file in the group (string). fwd_posting_date_file: Date and time of the file in the original group (string). message: Content of the message (string). fwd_message: Content of the forwarded message (string). transcribed_message: Transcribed content from speech-containing media files (string). fwd_transcribed_message: Transcribed content from speech-containing media files in forwarded messages (string). link_url: URL found in the message, typically in an HTML tag (string). website: The website associated with the link URL (string). replied_to: UID of the message being replied to (string). message_id: Either the mid_message or mid_file (int). media_file: Name of the attached media file (string). media_file_type: Type of media file ("voice message", "video", or "photo") (string). fwd_media_file: Internal path of a forwarded media file (string). fwd_media_file_type: Type of forwarded media file ("voice message", "video", or "photo") (string). day: Day of the month (int, 1–31). day_of_year: Day of the year (int, 1–366). weekday: Day of the week (int, 1–7). week: Week of the year (int, 1–52). month: Month of the year (int, 1–12). year: Year (int). message_hash: Hashed value of the message content for network detection (string). fwd_message_hash: Hashed value of the forwarded message content for network detection (string). author: Hashed name of the message author (string). fwd_author: Hashed name of the forwarded message author (string). lang_message: Language detected in the message (string). lang_fwd_message: Language detected in the forwarded message (string). is_unused: Indicates if the message originated from flagged ("dump") groups (boolean). file_size_mb: Size of the attached media file in the message (float). fwd_file_size_mb: Size of the attached media file in the forwarded message (float). duration_sec_audio: Duration of the audio file in the message (float, seconds). fwd_duration_sec_audio: Duration of the audio file in the forwarded message (float, seconds). duration_sec_video: Duration of the video file in the message (float, seconds). fwd_duration_sec_video: Duration of the video file in the forwarded message (float, seconds). file_exists: Indicates if the referenced media file was found in the data (boolean). file_error: Indicates if the referenced media file was parseable (boolean).
创建时间:
2025-04-12
二维码
社区交流群
二维码
科研交流群
商业服务