Messages from alternative Spanish Telegram channels, 2019-2024
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/15065452
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains processed data extracted from Telegram channels using pytopicgram from 2019-12-01 to 2024-08-31. It includes anonymized channel information, sampled messages, and topics identified using BERTopic. The data has been anonymized and structured for ease of analysis. The dataset comprises two main CSV files:
1. Topics (topics.csv)
This file contains topics extracted from the full dataset using BERTopic. Each topic is described by a concise text generated by OpenAI o1.
Column Name
Description
Topic
Numeric identifier for each topic. -1 is the generic topic for non-assignable messages.
Name
Human-readable name summarizing the topic.
Representation
List of representative keywords for the topic.
Description
Concise description of the topic generated by OpenAI.
2. Messages (messages.csv)
This file contains a 25% stratified sample of messages (on topic column) from Telegram channels.
Column Name
Description
channel_id
Anonymized identifier for the Telegram channel.
week_year
Week and year when the message was posted (format: week_year).
media_type
Type of media included in the message (txt, img, video, audio, doc, web).
reach
Number of users reached by the message.
virality
Virality score of the message.
is_viral
Boolean indicating whether the message is considered viral.
topics
Topic identifier associated with the message.
probs
Probability scores for topic assignment.
创建时间:
2025-04-11



