AMATC-LLM Augmented ArSarcasm-v2

NIAID Data Ecosystem2026-05-10 收录

下载链接：

https://doi.org/10.7910/DVN/REGHGA

下载链接

链接失效反馈

官方服务：

资源简介：

This dataset contains synthetic Arabic tweets generated by the AMATC-LLM framework, an augmentation system developed on top of the original ArSarcasm-v2 corpus. The data were produced through a two-stage process that combines human conceptual abstraction with controlled large language model (LLM) generation to create context-rich and dialect-aware Arabic text. Each record includes a generated tweet labeled for sarcasm (TRUE or FALSE), sentiment (POS, NEG, or NEU), and dialect (magreb, egypt, levant, gulf, or msa). Only LLM-generated samples are included; the original ArSarcasm-v2 data are excluded to respect their license. This resource supports research in Arabic multi-task learning, sarcasm detection, sentiment analysis, and dialect identification, with a focus on low-resource and multi-dialect Arabic NLP. Original dataset before adding our LLM augmentation data is available at: https://github.com/iabufarha/ArSarcasm-v2

创建时间：

2025-11-16

5,000+

优质数据集

54 个

任务类型

进入经典数据集