five

iSarcasm

收藏
OpenDataLab2026-05-17 更新2024-05-09 收录
下载链接:
https://opendatalab.org.cn/OpenDataLab/iSarcasm
下载链接
链接失效反馈
官方服务:
资源简介:
iSarcasm 是推文数据集,每条推文都标记为讽刺或非讽刺。每条讽刺推文都被进一步标记为以下类型的讽刺言论之一: 讽刺:与事态相矛盾且对收件人至关重要的推文; 具有讽刺意味的是:与事态相矛盾但对收件人没有明显批评的推文; 讽刺:看似支持收件人的推文,但包含潜在的分歧和嘲笑; 轻描淡写:推文破坏了他们所指的事态的重要性; 夸大其词:以明显夸大的方式描述事态的推文; 修辞问题:推文包含一个问题,其邀请的推论(暗示)显然与事态相矛盾。 对于每条讽刺推文,还有: 用英语句子解释为什么它是讽刺的,以及 非讽刺地传达相同含义的改写。两者均由推文的作者提供。 iSarcasm 包含 4,484 条推文,其中 777 条被标记为讽刺,3,707 条被标记为非讽刺。您会找到两个文件,isarcasm_train.csv 和 isarcasm_test.csv,每个文件分别包含随机选择的 80% 和 20% 的示例。文件中的每一行都具有 tweet_id、sarcasm_label、sarcasm_type 格式,其中 sarcasm_type 仅为讽刺性推文定义,如上所述。

The iSarcasm dataset is a tweet corpus where each tweet is labeled as either sarcastic or non-sarcastic. Each sarcastic tweet is further categorized into one of the following sarcasm types: 1. Sarcasm: Tweets that contradict the current situation and are crucial to the recipient; 2. Coincidental Irony: Tweets that contradict the current situation but do not contain explicit criticism towards the recipient; 3. Apparent Support Sarcasm: Tweets that seemingly express support for the recipient but contain underlying disagreement and mockery; 4. Understatement: Tweets that diminish the significance of the situation they refer to; 5. Exaggeration: Tweets that describe the current situation in an obviously exaggerated manner; 6. Rhetorical Question: Tweets that contain a question, where the implied inference invited by the question clearly contradicts the current situation. For each sarcastic tweet, additional annotations include an English sentence explaining why the tweet is sarcastic, and a non-sarcastic rephrasing that conveys the same meaning. Both annotations are provided by the tweet's author. The iSarcasm dataset contains a total of 4,484 tweets, among which 777 are labeled as sarcastic and 3,707 are labeled as non-sarcastic. Two files are provided: isarcasm_train.csv and isarcasm_test.csv, which respectively contain 80% and 20% of the randomly split samples. Each row in the files follows the format of tweet_id, sarcasm_label, and sarcasm_type, where sarcasm_type is only defined for sarcastic tweets as described above.
提供机构:
OpenDataLab
创建时间:
2022-06-23
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
iSarcasm是一个包含4,484条推文的数据集,每条推文标记为讽刺或非讽刺,讽刺推文进一步细分为多种类型,并提供了英文解释和改写。数据集分为训练集和测试集,适用于讽刺检测和情感分析任务。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作