BanglaSportsEmotion: A Multi-Class Sentiment Dataset for Bangla Sports Commentary
收藏NIAID Data Ecosystem2026-05-10 收录
下载链接:
https://data.mendeley.com/datasets/ykxsndr53y
下载链接
链接失效反馈官方服务:
资源简介:
BanglaSportsEmotion addresses a critical gap in Bangla natural language processing by providing the first comprehensive, multi-sport sentiment corpus specifically designed for emotion analysis in sports commentary. BanglaSportsEmotion is a manually annotated dataset containing 8,582 Bangla sports comments collected from Facebook and YouTube platforms. This dataset enables researchers to develop and benchmark sentiment analysis models for the Bangla language, particularly in the sports domain where fan emotions range from celebration to criticism.
Data Collection Sources and Scope
Data was collected from various Bangla sports-related online platforms to ensure broad coverage and diversity:
1. Sources: Publicly accessible comments collected from sports-related Facebook pages/groups and YouTube channels (examples include bdcrictime.com discussions, T Sports video comments, RabbitHoleBD sports threads).
2. Initial raw volume: ≈ 16,000 raw comments were collected prior to filtering.
3. Final released volume: 8,582 comments after deduplication, spam removal, and relevance filtering.
4. Sports Coverage: Cricket, football, volleyball, hockey, and other sports
5. Geographic Scope: Comments about Bangladeshi national teams, international teams, club football, and various sporting events
6. Time Period: Recent comments reflecting current fan discourse and language usage
Class Definitions:
i. Joy (Label 0) - Positive emotions such as happiness, excitement, celebration, or praise for a team or player
ii. Anger (Label 1) - Negative emotions directed toward one's own team, players, or performance.
iii. Support (Label 2) - Encouragement, loyalty, or backing for a team or player regardless of the outcome.
iv. Toxic (Label 3) - Harsh, offensive, or sarcastic remarks often directed at opponents or rival fans.
Key Features:
1. Fairly balanced class distribution
2. Multi-sport coverage ensuring broader generalizability
3. Clear annotation guidelines for reproducibility
4. High inter-annotator agreement
5. Captures nuanced emotions including semantic overlap between Anger and Toxic classes
Use Cases:
i. Sentiment analysis model development for Bangla
ii. Low-resource NLP research
iii. Sports analytics and fan engagement studies
iv. Benchmark evaluation for transformer and classical ML models
v. Cross-lingual sentiment analysis studies
File Format
CSV file with two columns:
- Comment text (Bangla)
- Class label (0: Joy; 1: Anger; 2: Support; 3: Toxic)
创建时间:
2026-02-12



