five

TAG-it Dataset

收藏
DataCite Commons2022-06-01 更新2024-07-13 收录
下载链接:
https://live.european-language-grid.eu/catalogue/corpus/8112
下载链接
链接失效反馈
官方服务:
资源简介:
The TAG-it dataset is composed of texts scraped from the ForumFree platform, written by 2,458 users, collected by Maslennikova et al. (2019). Texts are classified by topics and information about the authors’ gender and age is provided. The dataset was created and used in the context of TAG-it (https://sites.google.com/view/tag-it-2020/), a profiling shared task for Italian proposed for the 2020 EVALITA campaign (http://www.evalita.it/2020). The challenge comprehended two subtasks: 1- Given a collection of texts (forum posts) the gender and the age of the author must be predicted, together with the topic the posts are about; 2- For posts coming from a small selection of topics not represented in the training data, systems have to predict either gender (Task 2a) or age (Task 2b).<p>The dataset is divided into training and test sets. The training data is the same for Task 1 and Task 2 and consists of 647918 tokens. The test set for Task 1 is composed of texts of the same topics of the training data; the test set for Task 2 comprehends topics not present in the training data.<p>
提供机构:
ELG
创建时间:
2022-06-01
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作