BanglaSarc3: A Benchmark Dataset for Bangla Sarcasm Detection from Social Media to Advance Bangla NLP
收藏Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/7tn76wdhsr
下载链接
链接失效反馈官方服务:
资源简介:
BanglaSarc3 dataset serves as a benchmark resource for sarcasm classification in Bangla, ensuring balanced category representation. The primary objective of BanglaSarc3 is to mitigate humor misinterpretation that often leads to digital conflicts and misunderstandings in online communication. To enhance dataset quality, preprocessing steps such as anonymization, duplicate removal, and text normalization were applied. Additionally, three native Bangla speakers independently reviewed and validated the labels, ensuring annotation reliability.
BanglaSarc3 introduce BanglaSarc3, a ternary-class dataset containing 12,089 Facebook comments, categorized as follows:
- Neutral: 4,056 comments
- Sarcastic: 4,012 comments
- Non-Sarcastic: 4,021 comments
The BanglaSarc3 dataset has significant implications across multiple NLP and AI domains, including:
1. Sarcasm Detection in Bangla Social Media
2. Sentiment and Emotion Analysis
3. Language Modeling and BNLP Advancements
4. Explainable AI (XAI) in Bangla NLP
5. Educational and Research Applications
The BanglaSarc3 dataset is openly available for academic and research purposes, fostering collaboration and innovation within the Bangla NLP community. By providing a robust foundation for sarcasm classification, this dataset aims to drive advancements in Bangla-centric AI applications, ensuring more inclusive and context-aware language models.
创建时间:
2025-02-24



