five

banglish-sentiment-2026

收藏
Hugging Face2026-03-14 更新2026-03-20 收录
下载链接:
https://huggingface.co/datasets/mdsajjadullah/banglish-sentiment-2026
下载链接
链接失效反馈
官方服务:
资源简介:
Banglish Sentiment Dataset 2026 是一个用于自然语言处理(NLP)任务的合成数据集,包含约25,000条独特的Banglish(孟加拉语用英文字母书写,例如'Ajke onek valo lagse')文本消息,每条消息标注为积极、消极或中性情感。该数据集旨在填补孟加拉语与英语混合文本(Banglish)在情感分析研究中的空白,适用于聊天机器人、社交媒体情感分析工具以及低资源语言研究。数据集完全由人工智能生成并经过唯一性检查,包含表情符号、标点符号、拼写变体和情感重复等特征以增强真实性。数据集中三种情感标签的比例大致平衡,各占约33%。该数据集特别适用于基于BanglaBERT或多语言模型的情感分类任务,以及Banglish NLP的教育与研究。

Banglish Sentiment Dataset 2026 is a synthetic dataset designed for natural language processing (NLP) tasks, containing approximately 25,000 unique Banglish text messages (Banglish refers to Bengali written using the English alphabet, e.g., 'Ajke onek valo lagse'), with each message annotated as positive, negative, or neutral in sentiment. This dataset aims to fill the research gap in sentiment analysis for code-mixed Bengali-English (Banglish) text, and is applicable to chatbots, social media sentiment analysis tools, and low-resource language research. The entire dataset is fully generated by artificial intelligence and has undergone uniqueness checks, incorporating features such as emojis, punctuation marks, spelling variations, and sentiment repetitions to enhance authenticity. The three sentiment labels are roughly balanced, each accounting for approximately 33% of the total dataset. This dataset is particularly suitable for sentiment classification tasks based on BanglaBERT or multilingual models, as well as education and research on Banglish NLP.
创建时间:
2026-03-07
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作