five

PAN20 Authorship Analysis: Celebrity Profiling

收藏
NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/3691921
下载链接
链接失效反馈
官方服务:
资源简介:
Synopsis Task: Given the Twitter feeds of the followers, determine the occupation, age, and gender of a celebrity. Evaluation: [code] Baselines: [code] See the full Shared Task [here] The datasets contain three files: a follower-feeds.ndjson as input, a labels.ndjson as output, and a celebrity-feeds.ndjson for additional study. Each file lists all celebrities as JSON objects, one per line and identified by the id key. The training dataset contains 1,920 celebrities and is balanced towards gender and occupation. The supplement dataset contains the remaining 8,265 celebrities but is not balanced in any way.   The follower-feeds.ndjson contains the English tweets of at least 10 followers for each celebrity, with at least 50 tweets each excluding retweets. {"id": 1234, "text": [["a tweet of follower 1", "another tweet of follower 1", ...], ["a tweet of follower 2", ...], ...]} {"id": 5678, "text": [["a tweet of follower 1", "another tweet of follower 1", ...], ["a tweet of follower 2", ...], ...]}   The celebrity-feeds.ndjson contains the Twitter timelines of the original celebrities, formatted as: {"id": 1234, "text": ["a tweet of celebrity 1", "another tweet of celebrity 1", ...]} {"id": 5678, "text": ["a tweet of celebrity 2", "another tweet", ...]}   The labels.ndjson contains the classes that should be predicted. A valid submission has to produce a labels.ndjson given the follower-feeds.ndjson and contain an entry for each id given in the input. {"id": 1234, "occupation": "sports", "gender": "female", "birthyear": 2002} {"id": 5678, "occupation": "professional", "gender": "male", "birthyear": 1990} The following values are possible for each of the traits: occupation := {sports, performer, creator, politics} birthyear := {1940, ..., 1999} gender := {male, female}
创建时间:
2021-01-25
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作