five

PAN19 Authorship Analysis: Celebrity Profiling

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/3530252
下载链接
链接失效反馈
官方服务:
资源简介:
Paper: https://webis.de/publications.html?q=wiegmann_2019a Source Dataset: https://files.webis.de/data-in-progress/data-research/social-media-analysis/acl19-celebrity-profiling/   Celebrities are among the most prolific users of social media, promoting their personas and rallying followers. This activity is closely tied to genuine writing samples, rendering them worthy research subjects in many respects, not least author profiling. The Celebrity Profiling task this year is to predict four traits of a celebrity from their social media communication. The traits are the degree of fame, occupation, age, and gender. The social media communication is given as the teaser messages from past tweets. The goal is to develop a piece of software which predicts celebrity traits from the teaser history. The training dataset contains two files: a feeds.ndjson as input and a labels.ndjson as output. Each file lists all celebrities as JSON objects, one per line and identified by the id key. The input file contains the cid and a list of all teaser messages for each celebrity. {"id": 1234, "text": ["a tweet", "another tweet", ...]} The output file contains the cid and a value for each trait for each celebrity from the input file. {"id": 1234, "fame": "star", "occupation": "sports", "gender": "female", "birthyear": 2002} The following values are possible for each of the traits: fame := {rising, star, superstar}  occupation := {sports, performer, creator, politics, manager, science, professional, religious}  birthyear := {1940, ..., 2012}  gender := {male, female, nonbinary}
创建时间:
2023-10-24
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作