CreativeLang/ColBERT_Humor_Detection

Name: CreativeLang/ColBERT_Humor_Detection
Creator: CreativeLang
Published: 2023-07-06 19:58:02
License: 暂无描述

Hugging Face2023-07-06 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/CreativeLang/ColBERT_Humor_Detection

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-2.0 --- # ColBERT_Humor ## Dataset Description - **Paper:** [Colbert: Using bert sentence embedding for humor detection](https://arxiv.org/abs/2004.12765) ## Dataset Summary ColBERT Humor contains 200,000 labeled short texts, equally distributed between humorous and non-humorous content. The dataset was created to overcome the limitations of prior humor detection datasets, which were characterized by inconsistencies in text length, word count, and formality, making them easy to predict with simple models without truly understanding the nuances of humor. The two sources for this dataset are the News Category dataset, featuring 200k news headlines from the Huffington Post (2012-2018), and a collection of 231,657 Reddit jokes. The texts have been rigorously preprocessed to ensure syntactic similarity, requiring models to delve into the linguistic intricacies to distinguish humor, effectively providing a more complex and substantial platform for humor detection research. For the details of this dataset, we refer you to the original [paper](https://arxiv.org/abs/2004.12765). Metadata in Creative Language Toolkit ([CLTK](https://github.com/liyucheng09/cltk)) - CL Type: Humor - Task Type: detection - Size: 200k - Created time: 2020 ### Citation Information If you find this dataset helpful, please cite: ``` @article{annamoradnejad2020colbert, title={Colbert: Using bert sentence embedding for humor detection}, author={Annamoradnejad, Issa and Zoghi, Gohar}, journal={arXiv preprint arXiv:2004.12765}, year={2020} } ``` ### Contributions If you have any queries, please open an issue or direct your queries to [mail](mailto:yucheng.li@surrey.ac.uk).

提供机构：

CreativeLang

原始信息汇总