ColBERT dataset - 200k short texts for humor detection

Name: ColBERT dataset - 200k short texts for humor detection
Creator: IEEE DataPort
Published: 2021-03-09 15:18:59
License: 暂无描述

DataCite Commons2021-03-09 更新2025-04-16 收录

下载链接：

https://ieee-dataport.org/documents/colbert-dataset-200k-short-texts-humor-detection

下载链接

链接失效反馈

官方服务：

资源简介：

Automatic humor detection has interesting use cases in modern technologies, such as chatbots and virtual assistants. Existing humor detection datasets usually combined formal non-humorous texts and informal jokes with incompatible statistics (text length, words count, etc.). This makes it more likely to detect humor with simple analytical models and without understanding the underlying latent lingual features and structures.We introduce a new combined dataset for the task of humor detection, entitled “ColBERT dataset”, which contains 200k labeled short texts, equally distributed between humor and non-humor. We reduced or completely removed issues of the existing datasets from the new dataset. The dataset is much larger than the previous datasets and it includes texts with similar textual features. Correlation between character count and the target is insignificant (+0.09), and there is no notable connection between the target value and sentiment features (correlation coefficient of -0.09 and +0.02 for polarity and subjectivity, respectively).

提供机构：

IEEE DataPort

创建时间：

2021-03-09

5,000+

优质数据集

54 个

任务类型

进入经典数据集