Processed data for the article "Perfilado Demográficos de Celebridades en Redes Sociales" - "Demographic Profiling of Celebrities in Social Networks"

NIAID Data Ecosystem2026-03-12 收录

下载链接：

https://zenodo.org/record/4767750

下载链接

链接失效反馈

官方服务：

资源简介：

This dataset includes all the processed data used for experimentation in the article "Perfilado Demográficos de Celebridades en Redes Sociales" - "Demographic Profiling of Celebrities in Social Networks", published in the journal Research in Computer Science. The dataset is a processed version of the training part from the CLEF 2020 celebrity profiling task (https://pan.webis.de/clef20/pan20-web/celebrity-profiling.html). The dataset consists of 5,066,608 tweets corresponding to 1,920 Twitter celebrities. All the tweets are in English. The dataset includes several files: 1. The 5,066,608 tweets in English 2. Four files indicating the gender, age, ocuppation and user associated with each tweet. 3. A list of 1374 common english abreviations used in social networks 4. The five features extracted from the tweets and used for the experiments: words, emoticons/emojis, hashtags, ats, abreviations

创建时间：

2021-05-18