Caryslara456/SARC_Sarcasm

Name: Caryslara456/SARC_Sarcasm
Creator: Caryslara456
Published: 2026-01-28 16:32:04
License: 暂无描述

Hugging Face2026-01-28 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/Caryslara456/SARC_Sarcasm

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: text dtype: string - name: author dtype: string - name: score dtype: int64 - name: ups dtype: int64 - name: downs dtype: int64 - name: date dtype: string - name: created_utc dtype: int64 - name: subreddit dtype: string - name: id dtype: string splits: - name: train num_bytes: 1764500045 num_examples: 12704751 download_size: 903559115 dataset_size: 1764500045 license: cc-by-2.0 --- # SARC_Sarcasm ## Dataset Description - **Paper:** [A Large Self-Annotated Corpus for Sarcasm](http://www.lrec-conf.org/proceedings/lrec2018/pdf/160.pdf) ## Dataset Summary A large corpus for sarcasm research and for training and evaluating systems for sarcasm detection is presented. The corpus comprises 1.3 million sarcastic statements, a quantity that is tenfold more substantial than any preceding dataset, and includes many more instances of non-sarcastic statements. This allows for learning in both balanced and unbalanced label regimes. Each statement is self-annotated; that is to say, sarcasm is labeled by the author, not by an independent annotator, and is accompanied by user, topic, and conversation context. The accuracy of the corpus is evaluated, benchmarks for sarcasm detection are established, and baseline methods are assessed. For the details of this dataset, we refer you to the original [paper](http://www.lrec-conf.org/proceedings/lrec2018/pdf/160.pdf). Metadata in Creative Language Toolkit ([CLTK](https://github.com/liyucheng09/cltk)) - CL Type: Sarcasm - Task Type: detection - Size: 1.3M - Created time: 2018 ### Contributions If you have any queries, please open an issue or direct your queries to [mail](mailto:yucheng.li@surrey.ac.uk).

提供机构：

Caryslara456

5,000+

优质数据集

54 个

任务类型

进入经典数据集