SINAI/MCE-Corpus
收藏数据集概述
数据集名称: MuchoCine corpus in English (MCE)
数据集描述: MCE是MuchoCine corpus(西班牙电影评论)的翻译版本。该数据集用于情感极性检测,结合了监督和非监督方法,通过三种分类器(两种监督分类器和一种非监督分类器)进行情感分析。文档的极性在1至5的范围内进行测量,其中1表示非常差,5表示非常好。
联系信息:
- jmperea@ujaen.es
- emcamara@ujaen.es
论文参考:
- 标题: Sentiment polarity detection in Spanish reviews combining supervised and unsupervised approaches
- 作者: María-Teresa Martín-Valdivia, Eugenio Martínez-Cámara, Jose-M. Perea-Ortega, L. Alfonso Ureña-López
- 出版年份: 2013
- 期刊: Expert Systems with Applications
- 卷/期: 40/10
- 页码: 3934-3942
- DOI: https://doi.org/10.1016/j.eswa.2012.12.084
许可证: Apache-2.0 License
引用信息: bibtex @article{MARTINVALDIVIA20133934, title = {Sentiment polarity detection in Spanish reviews combining supervised and unsupervised approaches}, journal = {Expert Systems with Applications}, volume = {40}, number = {10}, pages = {3934-3942}, year = {2013}, issn = {0957-4174}, doi = {https://doi.org/10.1016/j.eswa.2012.12.084}, url = {https://www.sciencedirect.com/science/article/pii/S0957417412013267}, author = {María-Teresa Martín-Valdivia and Eugenio Martínez-Cámara and Jose-M. Perea-Ortega and L. Alfonso Ureña-López}, keywords = {Sentiment polarity detection, Multilingual opinion mining, Spanish review corpus, SentiWordNet, Metaclassifiers, Stacking algorithm, Voting system}, abstract = {Sentiment polarity detection is one of the most popular tasks related to Opinion Mining. Many papers have been presented describing one of the two main approaches used to solve this problem. On the one hand, a supervised methodology uses machine learning algorithms when training data exist. On the other hand, an unsupervised method based on a semantic orientation is applied when linguistic resources are available. However, few studies combine the two approaches. In this paper we propose the use of meta-classifiers that combine supervised and unsupervised learning in order to develop a polarity classification system. We have used a Spanish corpus of film reviews along with its parallel corpus translated into English. Firstly, we generate two individual models using these two corpora and applying machine learning algorithms. Secondly, we integrate SentiWordNet into the English corpus, generating a new unsupervised model. Finally, the three systems are combined using a meta-classifier that allows us to apply several combination algorithms such as voting system or stacking. The results obtained outperform those obtained using the systems individually and show that this approach could be considered a good strategy for polarity classification when we work with parallel corpora.} }



