SentMix-3L
收藏arXiv2023-11-29 更新2024-06-21 收录
下载链接:
https://github.com/LanguageTechnologyLab/SentMix3L
下载链接
链接失效反馈官方服务:
资源简介:
SentMix-3L是由乔治梅森大学创建的一个包含1007条数据的情感分析数据集,涉及孟加拉语、英语和印地语三种语言的混合使用。该数据集通过一组精通三种语言的学生在社交媒体上发布的内容收集而成,确保了数据的真实性和多样性。数据集的创建过程经过精心设计,包括数据收集和两步标注过程,以确保高质量的情感标签。SentMix-3L主要用于情感分析研究,旨在解决多语言混合文本中的情感识别问题,为跨语言情感分析提供了一个重要的研究资源。
SentMix-3L is a sentiment analysis dataset developed by George Mason University, comprising 1007 samples, featuring mixed usage of three languages: Bengali, English, and Hindi. The dataset is collected from social media posts authored by a cohort of students proficient in these three languages, thus ensuring the authenticity and diversity of the data. The development pipeline of SentMix-3L is meticulously engineered, encompassing data collection and a two-step annotation workflow, to secure high-quality sentiment labels. Primarily utilized for sentiment analysis research, SentMix-3L aims to address sentiment recognition challenges in multilingual mixed-text scenarios, thereby serving as a pivotal research resource for cross-linguistic sentiment analysis.
提供机构:
乔治梅森大学
创建时间:
2023-10-27



