Massively Multilingual Corpus of Sentiment Datasets
收藏arXiv2023-06-14 更新2024-06-21 收录
下载链接:
https://huggingface.co/datasets/Brand24/mms
下载链接
链接失效反馈官方服务:
资源简介:
Massively Multilingual Corpus of Sentiment Datasets是由弗罗茨瓦夫科技大学和Brand24 AI合作创建的大型多语言情感数据集,包含79个经过严格筛选的数据集,覆盖27种语言,代表6个语言家族。数据集内容丰富,包括社交媒体、评论、新闻等多个领域,旨在通过多方面的情感分类基准,推动跨语言情感分析的研究。创建过程中,数据集经过精细的预处理和质量控制,确保数据的高质量和一致性。该数据集的应用领域广泛,主要用于解决跨语言情感分析中的挑战,特别是在文化和语言差异显著的情况下,为模型训练和性能评估提供重要资源。
The Massively Multilingual Corpus of Sentiment Datasets is a large-scale multilingual sentiment dataset jointly developed by Wrocław University of Science and Technology and Brand24 AI. It comprises 79 rigorously curated datasets, covering 27 languages from six language families. The corpus spans diverse domains including social media, consumer reviews, news articles and more, and is designed to advance cross-lingual sentiment analysis research by providing multi-faceted sentiment classification benchmarks. During its construction, the dataset underwent meticulous preprocessing and strict quality control to ensure high data quality and consistency. Widely applicable, this resource serves as a critical asset for model training and performance evaluation, particularly addressing the core challenges of cross-lingual sentiment analysis in scenarios with prominent cultural and linguistic disparities.
提供机构:
弗罗茨瓦夫科技大学
创建时间:
2023-06-14



