Improved Turkish DataSet for Sentiment Analysis in Social Media
收藏arXiv2018-01-31 更新2024-08-06 收录
下载链接:
http://arxiv.org/abs/1801.09975v2
下载链接
链接失效反馈官方服务:
资源简介:
Improved Turkish DataSet for Sentiment Analysis in Social Media是由Firat大学计算机工程系创建的一个专门用于社交媒体情感分析的数据集。该数据集包含2983条电影评论,通过Apache ManifoldCF技术自动收集,并利用Hadoop和MapReduce技术进行处理。数据集的创建旨在通过有效的拼写纠正算法提高土耳其语情感分析的准确性,解决因拼写错误导致的词向量空间膨胀问题。该数据集适用于情感分析、事件预测和趋势检测等文本挖掘应用,有助于提升机器学习算法在处理社交媒体文本数据时的性能。
This dataset, named Improved Turkish DataSet for Sentiment Analysis in Social Media, was developed by the Department of Computer Engineering at Firat University, and is specifically designed for social media sentiment analysis tasks. It contains 2,983 movie reviews, which were automatically collected via Apache ManifoldCF technology and processed using Hadoop and MapReduce technologies. The dataset was created to improve the accuracy of Turkish sentiment analysis through effective spelling correction algorithms, addressing the problem of inflated word vector space caused by spelling errors. This dataset is applicable to text mining scenarios including sentiment analysis, event prediction and trend detection, and can help enhance the performance of machine learning algorithms when processing social media text data.
提供机构:
Firat大学计算机工程系
创建时间:
2018-01-30



