Crowd-Annotated Spanish Corpus for Humor Analysis

Name: Crowd-Annotated Spanish Corpus for Humor Analysis
Creator: 自然语言处理小组，工程学院，乌拉圭共和国大学
Published: 2018-07-19 12:52:36
License: 暂无描述

arXiv2018-07-19 更新2024-06-21 收录

下载链接：

https://pln-fing-udelar.github.io/humor

下载链接

链接失效反馈

官方服务：

资源简介：

本数据集名为‘Crowd-Annotated Spanish Corpus for Humor Analysis’，由乌拉圭共和国大学的自然语言处理小组创建。数据集包含27,282条西班牙语推文，这些推文来自幽默和非幽默账户，每条推文平均获得约四个幽默值和幽默评分注释。数据集的创建过程涉及从选定账户和实时样本中提取推文，并通过众包网络任务进行注释。该数据集主要用于构建西班牙语幽默分类器，并作为研究幽默和幽默主观性的第一步。

This dataset, named *Crowd-Annotated Spanish Corpus for Humor Analysis*, was developed by the Natural Language Processing Group of the University of the Republic of Uruguay. The corpus comprises 27,282 Spanish tweets sourced from both humorous and non-humorous user accounts, with each tweet receiving an average of approximately four annotations for humor values and humor ratings. The construction of this dataset involved extracting tweets from pre-selected accounts and real-time samples, followed by annotation through crowdsourced web-based tasks. This corpus is primarily intended for building Spanish-language humor classifiers, and serves as the initial step for research on humor and humorous subjectivity.

提供机构：

自然语言处理小组，工程学院，乌拉圭共和国大学

创建时间：

2017-10-02

5,000+

优质数据集

54 个

任务类型

进入经典数据集