English-Hindi Code-Mixed Humor Detection Corpus
收藏arXiv2018-06-14 更新2024-06-21 收录
下载链接:
https://github.com/Ankh2295/humor-detection-corpus
下载链接
链接失效反馈官方服务:
资源简介:
本数据集名为‘English-Hindi Code-Mixed Humor Detection Corpus’,由国际信息技术研究所的语言技术研究中心创建。数据集包含3543条英-印混杂语的社交媒体推文,这些推文均以拉丁字母书写,并被手动分类为幽默或非幽默类别。此外,每个词汇还附有语言标签,指明其来源语言(英语或印地语)。数据集的创建旨在解决社交媒体中多语言混用文本的幽默自动识别问题,特别是在南亚等多语言区域。
This dataset, named *English-Hindi Code-Mixed Humor Detection Corpus*, was developed by the Language Technology Research Center of the International Institute of Information Technology. It comprises 3,543 Latin-script social media tweets in code-mixed English and Hindi, which have been manually annotated as either humorous or non-humorous. Additionally, each token in the tweets is paired with a language tag specifying its source language, either English or Hindi. This corpus was constructed to address the challenge of automatic humor recognition for code-mixed multilingual social media texts, particularly in multilingual regions like South Asia.
提供机构:
国际信息技术研究所,海得拉巴语言技术研究中心
创建时间:
2018-06-14



