TamilMixSentiment
收藏arXiv2020-05-30 更新2024-06-21 收录
下载链接:
https://github.com/bharathichezhiyan/TamilMixSentiment
下载链接
链接失效反馈官方服务:
资源简介:
TamilMixSentiment是一个针对Tamil-English代码混合文本的情感分析数据集,由Insight SFI研究中心创建。该数据集包含15,744条来自YouTube的评论,旨在解决低资源语言情感分析的难题。数据集的创建过程涉及使用YouTube评论抓取工具收集数据,并通过语言检测库进行过滤和预处理。TamilMixSentiment数据集的应用领域主要集中在社交媒体上的视频评论情感分析,特别是在多语言社区中处理代码混合文本的挑战。
TamilMixSentiment is a sentiment analysis dataset for Tamil-English code-mixed text, developed by the Insight SFI Research Centre. It contains 15,744 YouTube comments, aiming to tackle the challenges in sentiment analysis for low-resource languages. The construction of this dataset involved collecting data via YouTube comment scraping tools, followed by filtering and preprocessing using language detection libraries. The main application scope of the TamilMixSentiment dataset focuses on sentiment analysis of video comments on social media, particularly addressing the challenges of handling code-mixed text in multilingual communities.
提供机构:
Insight SFI研究中心数据分析,数据科学研究所,爱尔兰国立高威大学
创建时间:
2020-05-30



