CreativeLang/EPIC_Irony
收藏Hugging Face2023-07-11 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/CreativeLang/EPIC_Irony
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: user
dtype: string
- name: label
dtype: string
- name: timestamp
dtype: string
- name: source
dtype: string
- name: subreddit
dtype: string
- name: id_original
dtype: string
- name: text
dtype: string
- name: parent_id_original
dtype: string
- name: parent_text
dtype: string
- name: Language_instance
dtype: string
- name: Language_variety
dtype: string
- name: Age
dtype: string
- name: Sex
dtype: string
- name: Ethnicity simplified
dtype: string
- name: Country of birth
dtype: string
- name: Country of residence
dtype: string
- name: Nationality
dtype: string
- name: Language_annotator
dtype: string
- name: Student status
dtype: string
- name: Employment status
dtype: string
splits:
- name: train
num_bytes: 7299373
num_examples: 14172
download_size: 1038853
dataset_size: 7299373
---
# EPIC_Irony
- paper: [EPIC: Multi-Perspective Annotation of a Corpus of Irony](https://assets.amazon.science/40/b4/0f6ec06a4a33a44485de1b2b57c7/epic-multi-perspective-annotation-of-a-corpus-of-irony.pdf) at ACL 2023
Key features:
- EPIC (English Perspectivist Irony Corpus) is an annotated corpus for irony analysis based on data perspectivism principles.
- The corpus contains social media conversations in five regional varieties of English, annotated by contributors from corresponding countries.
- The dataset explores the perspectives of annotators, taking into account their origin, age, and gender.
- Perspective-aware models were created to validate EPIC, and these proved more effective and confident in identifying irony than non-perspectivist models.
- The models showcase variation in irony perception across different demographic groups.
- EPIC serves as a valuable resource for training perspective-aware models for irony detection.
Metadata in Creative Language Toolkit ([CLTK](https://github.com/liyucheng09/cltk))
- CL Type: Irony
- Task Type: detection
- Size: 14k
- Created time: 2023
提供机构:
CreativeLang
原始信息汇总
数据集概述
数据集名称
- 名称: EPIC_Irony
数据集特征
- 特征列表:
- user
- label
- timestamp
- source
- subreddit
- id_original
- text
- parent_id_original
- parent_text
- Language_instance
- Language_variety
- Age
- Sex
- Ethnicity simplified
- Country of birth
- Country of residence
- Nationality
- Language_annotator
- Student status
- Employment status
- 数据类型: 所有特征均为字符串类型 (
dtype: string)
数据集分割
- 分割:
- 名称: train
- 大小:
- 字节数: 7299373
- 示例数: 14172
数据集大小
- 下载大小: 1038853 字节
- 数据集大小: 7299373 字节
数据集用途
- 用途: 用于训练和验证视角感知模型,特别是用于讽刺检测。
- 特点:
- 基于数据视角主义原则的讽刺分析注释语料库。
- 包含五种地区英语变体的社交媒体对话,由相应国家的贡献者注释。
- 探索注释者的视角,考虑其原籍、年龄和性别。
- 展示不同人口统计群体在讽刺感知上的变化。
- 作为训练视角感知模型进行讽刺检测的宝贵资源。



