five

BanglaSportsEmotion: A Multi-Class Sentiment Dataset for Bangla Sports Commentary

收藏
NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://data.mendeley.com/datasets/ykxsndr53y
下载链接
链接失效反馈
官方服务:
资源简介:
BanglaSportsEmotion addresses a critical gap in Bangla natural language processing by providing the first comprehensive, multi-sport sentiment corpus specifically designed for emotion analysis in sports commentary. BanglaSportsEmotion is a manually annotated dataset containing 8,582 Bangla sports comments collected from Facebook and YouTube platforms. This dataset enables researchers to develop and benchmark sentiment analysis models for the Bangla language, particularly in the sports domain where fan emotions range from celebration to criticism. Data Collection Sources and Scope Data was collected from various Bangla sports-related online platforms to ensure broad coverage and diversity: 1. Sources: Publicly accessible comments collected from sports-related Facebook pages/groups and YouTube channels (examples include bdcrictime.com discussions, T Sports video comments, RabbitHoleBD sports threads). 2. Initial raw volume: ≈ 16,000 raw comments were collected prior to filtering. 3. Final released volume: 8,582 comments after deduplication, spam removal, and relevance filtering. 4. Sports Coverage: Cricket, football, volleyball, hockey, and other sports 5. Geographic Scope: Comments about Bangladeshi national teams, international teams, club football, and various sporting events 6. Time Period: Recent comments reflecting current fan discourse and language usage Class Definitions: i. Joy (Label 0) - Positive emotions such as happiness, excitement, celebration, or praise for a team or player ii. Anger (Label 1) - Negative emotions directed toward one's own team, players, or performance. iii. Support (Label 2) - Encouragement, loyalty, or backing for a team or player regardless of the outcome. iv. Toxic (Label 3) - Harsh, offensive, or sarcastic remarks often directed at opponents or rival fans. Key Features: 1. Fairly balanced class distribution 2. Multi-sport coverage ensuring broader generalizability 3. Clear annotation guidelines for reproducibility 4. High inter-annotator agreement 5. Captures nuanced emotions including semantic overlap between Anger and Toxic classes Use Cases: i. Sentiment analysis model development for Bangla ii. Low-resource NLP research iii. Sports analytics and fan engagement studies iv. Benchmark evaluation for transformer and classical ML models v. Cross-lingual sentiment analysis studies File Format CSV file with two columns: - Comment text (Bangla) - Class label (0: Joy; 1: Anger; 2: Support; 3: Toxic)
创建时间:
2026-02-12
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作