Cross-Genre Gender Prediction (GxG) dataset

DataCite Commons2022-06-01 更新2024-07-13 收录

下载链接：

https://live.european-language-grid.eu/catalogue/corpus/7367

下载链接

链接失效反馈

官方服务：

资源简介：

The GxG dataset collects 21874 documents from five different genres (Twitter, YouTube, children writing, news/journalism, personal diaries) annotated with the gender of the author. The dataset has been used in the Cross-Genre Gender Prediction (GxG) task (https://sites.google.com/view/gxg2018), a shared task on author profiling (in terms of gender) on Italian texts, with a specific focus on cross-genre performance, organised as part of EVALITA 2018 (http://www.evalita.it/2018). <p>The dataset is divided into training and test data, constituted of respectively 454585 and 374012 tokens . The distribution of gender labels is controlled for in each dataset (50/50). In order to comply with GDPR privacy rules and Twitter’s policies, the identifiers of tweets and users have been anonymized and replaced by unique identifiers.

提供机构：

ELG

创建时间：

2022-06-01

5,000+

优质数据集

54 个

任务类型

进入经典数据集