Processed data for the article "Perfilado Demográficos de Celebridades en Redes Sociales" - "Demographic Profiling of Celebrities in Social Networks"
收藏NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/4767750
下载链接
链接失效反馈官方服务:
资源简介:
This dataset includes all the processed data used for experimentation in the article "Perfilado Demográficos de Celebridades en Redes Sociales" - "Demographic Profiling of Celebrities in Social Networks", published in the journal Research in Computer Science. The dataset is a processed version of the training part from the CLEF 2020 celebrity profiling task (https://pan.webis.de/clef20/pan20-web/celebrity-profiling.html). The dataset consists of 5,066,608 tweets corresponding to 1,920 Twitter celebrities. All the tweets are in English. The dataset includes several files:
1. The 5,066,608 tweets in English
2. Four files indicating the gender, age, ocuppation and user associated with each tweet.
3. A list of 1374 common english abreviations used in social networks
4. The five features extracted from the tweets and used for the experiments: words, emoticons/emojis, hashtags, ats, abreviations
创建时间:
2021-05-18



