iSarcasm
收藏OpenDataLab2026-05-17 更新2024-05-09 收录
下载链接:
https://opendatalab.org.cn/OpenDataLab/iSarcasm
下载链接
链接失效反馈官方服务:
资源简介:
iSarcasm 是推文数据集,每条推文都标记为讽刺或非讽刺。每条讽刺推文都被进一步标记为以下类型的讽刺言论之一:
讽刺:与事态相矛盾且对收件人至关重要的推文;
具有讽刺意味的是:与事态相矛盾但对收件人没有明显批评的推文;
讽刺:看似支持收件人的推文,但包含潜在的分歧和嘲笑;
轻描淡写:推文破坏了他们所指的事态的重要性;
夸大其词:以明显夸大的方式描述事态的推文;
修辞问题:推文包含一个问题,其邀请的推论(暗示)显然与事态相矛盾。
对于每条讽刺推文,还有:
用英语句子解释为什么它是讽刺的,以及
非讽刺地传达相同含义的改写。两者均由推文的作者提供。
iSarcasm 包含 4,484 条推文,其中 777 条被标记为讽刺,3,707 条被标记为非讽刺。您会找到两个文件,isarcasm_train.csv 和 isarcasm_test.csv,每个文件分别包含随机选择的 80% 和 20% 的示例。文件中的每一行都具有 tweet_id、sarcasm_label、sarcasm_type 格式,其中 sarcasm_type 仅为讽刺性推文定义,如上所述。
The iSarcasm dataset is a tweet corpus where each tweet is labeled as either sarcastic or non-sarcastic. Each sarcastic tweet is further categorized into one of the following sarcasm types:
1. Sarcasm: Tweets that contradict the current situation and are crucial to the recipient;
2. Coincidental Irony: Tweets that contradict the current situation but do not contain explicit criticism towards the recipient;
3. Apparent Support Sarcasm: Tweets that seemingly express support for the recipient but contain underlying disagreement and mockery;
4. Understatement: Tweets that diminish the significance of the situation they refer to;
5. Exaggeration: Tweets that describe the current situation in an obviously exaggerated manner;
6. Rhetorical Question: Tweets that contain a question, where the implied inference invited by the question clearly contradicts the current situation.
For each sarcastic tweet, additional annotations include an English sentence explaining why the tweet is sarcastic, and a non-sarcastic rephrasing that conveys the same meaning. Both annotations are provided by the tweet's author.
The iSarcasm dataset contains a total of 4,484 tweets, among which 777 are labeled as sarcastic and 3,707 are labeled as non-sarcastic. Two files are provided: isarcasm_train.csv and isarcasm_test.csv, which respectively contain 80% and 20% of the randomly split samples. Each row in the files follows the format of tweet_id, sarcasm_label, and sarcasm_type, where sarcasm_type is only defined for sarcastic tweets as described above.
提供机构:
OpenDataLab
创建时间:
2022-06-23
搜集汇总
数据集介绍

背景与挑战
背景概述
iSarcasm是一个包含4,484条推文的数据集,每条推文标记为讽刺或非讽刺,讽刺推文进一步细分为多种类型,并提供了英文解释和改写。数据集分为训练集和测试集,适用于讽刺检测和情感分析任务。
以上内容由遇见数据集搜集并总结生成



