英语-印地语混合讽刺推文数据集

Name: 英语-印地语混合讽刺推文数据集
Creator: 国际信息技术研究所语言技术研究中心
Published: 2018-05-30 17:08:54
License: 暂无描述

arXiv2018-05-30 更新2024-06-21 收录

下载链接：

https://github.com/sahilswami96/SarcasmDetectionCodeMixed

下载链接

链接失效反馈

官方服务：

资源简介：

英语-印地语混合讽刺推文数据集是由国际信息技术研究所语言技术研究中心创建，专注于社交媒体上的讽刺和讽刺检测。该数据集包含5250条推文，每条推文都标注了是否包含讽刺，并进行了语言标记。数据集通过提取包含特定标签的推文并手动筛选和标注形成，旨在为讽刺检测和语言识别技术提供资源。该数据集的应用领域包括商业和安全服务中的意见挖掘和情感分析，特别是在处理社交媒体上的讽刺表达时。

English-Hindi code-mixed sarcasm tweet dataset was developed by the Language Technology Research Center of the International Institute of Information Technology, focusing on sarcasm content and sarcasm detection in social media. This dataset contains 5,250 tweets, each annotated with whether it contains sarcasm and labeled for its language composition. Constructed by extracting tweets with specific pre-defined tags followed by manual screening and annotation, it aims to provide valuable resources for sarcasm detection and language identification technologies. Its application fields include opinion mining and sentiment analysis in commercial and security services, particularly when handling sarcastic expressions on social media.

提供机构：

国际信息技术研究所语言技术研究中心

创建时间：

2018-05-30