five

MarathiSarc

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://data.mendeley.com/datasets/273n4sr2z3
下载链接
链接失效反馈
官方服务:
资源简介:
Sarcasm Detection is a task of predicting whether the given text is sarcastic or not. Considering the challenges in detecting sarcasm in a sentiment bearing text, sarcasm detection has become one of the hot research area in Natural Language Processing. Considerable amount of research has been done in this area for foreign languages such as English, Czech, Italian, Dutch, Indonesian etc. Small amount of work in this area is also available for Indian languages such as Hindi, Tamil, Bengali etc. However, Marathi being the third most popular language in India, lags far behind in this area. One of the most crucial reasons for this is the absence of proper dataset. We present MarathiSarc - a dataset of labelled Marathi tweets for sarcasm detections Considering the limitation of Twitter API, we preferred to use the Twint library of twitter for collecting the tweets. Using this, we were able to collect 2361 tweets in Marathi language. In the first stage, using the hashtag based supervision technique we collected Marathi tweets containing hashtags such as #sarcasm, #sarcastic, #sarcasmic #irony, #ironic etc. The time period of the corpus is from December 2011. We have manually labelled the entire dataset into three classes as follows: • Tweets that contained the hashtags such as #sarcasm, #sarcastic, #sarcasmic, #व्यंग #irony, #ironic and found to be actually sarcastic are labelled as sarcastic. (1) • Tweets that contained the hashtags such as #sarcasm, #sarcastic, #sarcasmic, #व्यंग,#irony, #ironic but are found to be actually non sarcastic are labelled as non- sarcastic. (-1) • Tweets which can be possibly sarcastic depending on the conversational history and the context are marked as possibly sarcastic
创建时间:
2024-12-16
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作