TAG-it Dataset
收藏DataCite Commons2022-06-01 更新2024-07-13 收录
下载链接:
https://live.european-language-grid.eu/catalogue/corpus/8112
下载链接
链接失效反馈官方服务:
资源简介:
The TAG-it dataset is composed of texts scraped from the ForumFree platform, written by 2,458 users, collected by Maslennikova et al. (2019). Texts are classified by topics and information about the authors’ gender and age is provided. The dataset was created and used in the context of TAG-it (https://sites.google.com/view/tag-it-2020/), a profiling shared task for Italian proposed for the 2020 EVALITA campaign (http://www.evalita.it/2020). The challenge comprehended two subtasks: 1- Given a collection of texts (forum posts) the gender and the age of the author must be predicted, together with the topic the posts are about; 2- For posts coming from a small selection of topics not represented in the training data, systems have to predict either gender (Task 2a) or age (Task 2b).<p>The dataset is divided into training and test sets. The training data is the same for Task 1 and Task 2 and consists of 647918 tokens. The test set for Task 1 is composed of texts of the same topics of the training data; the test set for Task 2 comprehends topics not present in the training data.<p>
提供机构:
ELG
创建时间:
2022-06-01



