five

Messages from alternative Spanish Telegram channels, 2019-2024

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/15065452
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains processed data extracted from Telegram channels using pytopicgram from 2019-12-01 to 2024-08-31. It includes anonymized channel information, sampled messages, and topics identified using BERTopic. The data has been anonymized and structured for ease of analysis. The dataset comprises two main CSV files:   1. Topics (topics.csv) This file contains topics extracted from the full dataset using BERTopic. Each topic is described by a concise text generated by OpenAI o1. Column Name Description Topic Numeric identifier for each topic. -1 is the generic topic for non-assignable messages. Name Human-readable name summarizing the topic. Representation List of representative keywords for the topic. Description Concise description of the topic generated by OpenAI. 2. Messages (messages.csv) This file contains a 25% stratified sample of messages (on topic column) from Telegram channels. Column Name Description channel_id Anonymized identifier for the Telegram channel. week_year Week and year when the message was posted (format: week_year). media_type Type of media included in the message (txt, img, video, audio, doc, web). reach Number of users reached by the message. virality Virality score of the message. is_viral Boolean indicating whether the message is considered viral. topics Topic identifier associated with the message. probs Probability scores for topic assignment.
创建时间:
2025-04-11
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作