five

AMATC-LLM Augmented ArSarcasm-v2

收藏
NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://doi.org/10.7910/DVN/REGHGA
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains synthetic Arabic tweets generated by the AMATC-LLM framework, an augmentation system developed on top of the original ArSarcasm-v2 corpus. The data were produced through a two-stage process that combines human conceptual abstraction with controlled large language model (LLM) generation to create context-rich and dialect-aware Arabic text. Each record includes a generated tweet labeled for sarcasm (TRUE or FALSE), sentiment (POS, NEG, or NEU), and dialect (magreb, egypt, levant, gulf, or msa). Only LLM-generated samples are included; the original ArSarcasm-v2 data are excluded to respect their license. This resource supports research in Arabic multi-task learning, sarcasm detection, sentiment analysis, and dialect identification, with a focus on low-resource and multi-dialect Arabic NLP. Original dataset before adding our LLM augmentation data is available at: https://github.com/iabufarha/ArSarcasm-v2
创建时间:
2025-11-16
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作