PAN 2017 Author Profiling Dataset

Name: PAN 2017 Author Profiling Dataset
Creator: PAN 2017 Author Profiling shared task committee
License: 暂无描述

arXiv2025-09-30 收录

下载链接：

http://pan.webis.de/clef17/pan17-web/author-profiling.html

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含了每位作者用4种不同语言编写的100条推文，旨在用于性别和母语识别任务。数据集以XML格式构建，并附带一个包含性别和语言变体类别的真实文件。根据所使用的最小文档频率，分类模型的性能有所不同。该数据集的规模为400条推文（每位作者100条），任务包括性别预测和母语识别。

This dataset includes 100 tweets per author written in four different languages, designed for gender and native language identification tasks. It is structured in XML format and comes with a ground truth file that contains gender and language variant categories. The performance of classification models differs according to the minimum document frequency utilized. The total size of the dataset is 400 tweets (100 per author), and the tasks encompass gender prediction and native language identification.

提供机构：

PAN 2017 Author Profiling shared task committee

搜集汇总

数据集介绍

背景与挑战

背景概述

PAN 2017 Author Profiling Dataset是一个用于作者分析的数据集，旨在通过Twitter文本识别作者的性别和语言变体。数据集包含多种语言的文本，并提供了训练和测试数据，适用于性别和语言变体的联合或单独识别任务。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集