five

Caryslara456/SARC_Sarcasm

收藏
Hugging Face2026-01-28 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Caryslara456/SARC_Sarcasm
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: text dtype: string - name: author dtype: string - name: score dtype: int64 - name: ups dtype: int64 - name: downs dtype: int64 - name: date dtype: string - name: created_utc dtype: int64 - name: subreddit dtype: string - name: id dtype: string splits: - name: train num_bytes: 1764500045 num_examples: 12704751 download_size: 903559115 dataset_size: 1764500045 license: cc-by-2.0 --- # SARC_Sarcasm ## Dataset Description - **Paper:** [A Large Self-Annotated Corpus for Sarcasm](http://www.lrec-conf.org/proceedings/lrec2018/pdf/160.pdf) ## Dataset Summary A large corpus for sarcasm research and for training and evaluating systems for sarcasm detection is presented. The corpus comprises 1.3 million sarcastic statements, a quantity that is tenfold more substantial than any preceding dataset, and includes many more instances of non-sarcastic statements. This allows for learning in both balanced and unbalanced label regimes. Each statement is self-annotated; that is to say, sarcasm is labeled by the author, not by an independent annotator, and is accompanied by user, topic, and conversation context. The accuracy of the corpus is evaluated, benchmarks for sarcasm detection are established, and baseline methods are assessed. For the details of this dataset, we refer you to the original [paper](http://www.lrec-conf.org/proceedings/lrec2018/pdf/160.pdf). Metadata in Creative Language Toolkit ([CLTK](https://github.com/liyucheng09/cltk)) - CL Type: Sarcasm - Task Type: detection - Size: 1.3M - Created time: 2018 ### Contributions If you have any queries, please open an issue or direct your queries to [mail](mailto:yucheng.li@surrey.ac.uk).
提供机构:
Caryslara456
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作