ColBERT dataset - 200k short texts for humor detection
收藏DataCite Commons2021-03-09 更新2025-04-16 收录
下载链接:
https://ieee-dataport.org/documents/colbert-dataset-200k-short-texts-humor-detection
下载链接
链接失效反馈官方服务:
资源简介:
Automatic humor detection has interesting use cases in modern technologies, such as chatbots and virtual assistants. Existing humor detection datasets usually combined formal non-humorous texts and informal jokes with incompatible statistics (text length, words count, etc.). This makes it more likely to detect humor with simple analytical models and without understanding the underlying latent lingual features and structures.We introduce a new combined dataset for the task of humor detection, entitled “ColBERT dataset”, which contains 200k labeled short texts, equally distributed between humor and non-humor. We reduced or completely removed issues of the existing datasets from the new dataset. The dataset is much larger than the previous datasets and it includes texts with similar textual features. Correlation between character count and the target is insignificant (+0.09), and there is no notable connection between the target value and sentiment features (correlation coefficient of -0.09 and +0.02 for polarity and subjectivity, respectively).
提供机构:
IEEE DataPort
创建时间:
2021-03-09



