five

Cross-Genre Gender Prediction (GxG) dataset

收藏
DataCite Commons2022-06-01 更新2024-07-13 收录
下载链接:
https://live.european-language-grid.eu/catalogue/corpus/7367
下载链接
链接失效反馈
官方服务:
资源简介:
The GxG dataset collects 21874 documents from five different genres (Twitter, YouTube, children writing, news/journalism, personal diaries) annotated with the gender of the author. The dataset has been used in the Cross-Genre Gender Prediction (GxG) task (https://sites.google.com/view/gxg2018), a shared task on author profiling (in terms of gender) on Italian texts, with a specific focus on cross-genre performance, organised as part of EVALITA 2018 (http://www.evalita.it/2018). <p>The dataset is divided into training and test data, constituted of respectively 454585 and 374012 tokens . The distribution of gender labels is controlled for in each dataset (50/50). In order to comply with GDPR privacy rules and Twitter’s policies, the identifiers of tweets and users have been anonymized and replaced by unique identifiers.
提供机构:
ELG
创建时间:
2022-06-01
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作