five

SINAI/MCE-Corpus

收藏
Hugging Face2024-03-22 更新2024-06-11 收录
下载链接:
https://hf-mirror.com/datasets/SINAI/MCE-Corpus
下载链接
链接失效反馈
官方服务:
资源简介:
MuchoCine corpus in English (MCE)是西班牙电影评论数据集MuchoCine的英文翻译版本。该数据集由研究员Fermín Cruz Mata开发,并于2008年在《自然语言处理》期刊的第41期中发表。数据集用于情感极性检测,文档的极性评分范围为1到5,1表示非常差,5表示非常好。该数据集结合了监督和无监督的分类方法,用于西班牙语评论的情感极性检测,并使用了英语资源进行情感分析。

MuchoCine corpus in English (MCE)是西班牙电影评论数据集MuchoCine的英文翻译版本。该数据集由研究员Fermín Cruz Mata开发,并于2008年在《自然语言处理》期刊的第41期中发表。数据集用于情感极性检测,文档的极性评分范围为1到5,1表示非常差,5表示非常好。该数据集结合了监督和无监督的分类方法,用于西班牙语评论的情感极性检测,并使用了英语资源进行情感分析。
提供机构:
SINAI
原始信息汇总

数据集概述

数据集名称: MuchoCine corpus in English (MCE)

数据集描述: MCE是MuchoCine corpus(西班牙电影评论)的翻译版本。该数据集用于情感极性检测,结合了监督和非监督方法,通过三种分类器(两种监督分类器和一种非监督分类器)进行情感分析。文档的极性在1至5的范围内进行测量,其中1表示非常差,5表示非常好。

联系信息:

  • jmperea@ujaen.es
  • emcamara@ujaen.es

论文参考:

  • 标题: Sentiment polarity detection in Spanish reviews combining supervised and unsupervised approaches
  • 作者: María-Teresa Martín-Valdivia, Eugenio Martínez-Cámara, Jose-M. Perea-Ortega, L. Alfonso Ureña-López
  • 出版年份: 2013
  • 期刊: Expert Systems with Applications
  • 卷/期: 40/10
  • 页码: 3934-3942
  • DOI: https://doi.org/10.1016/j.eswa.2012.12.084

许可证: Apache-2.0 License

引用信息: bibtex @article{MARTINVALDIVIA20133934, title = {Sentiment polarity detection in Spanish reviews combining supervised and unsupervised approaches}, journal = {Expert Systems with Applications}, volume = {40}, number = {10}, pages = {3934-3942}, year = {2013}, issn = {0957-4174}, doi = {https://doi.org/10.1016/j.eswa.2012.12.084}, url = {https://www.sciencedirect.com/science/article/pii/S0957417412013267}, author = {María-Teresa Martín-Valdivia and Eugenio Martínez-Cámara and Jose-M. Perea-Ortega and L. Alfonso Ureña-López}, keywords = {Sentiment polarity detection, Multilingual opinion mining, Spanish review corpus, SentiWordNet, Metaclassifiers, Stacking algorithm, Voting system}, abstract = {Sentiment polarity detection is one of the most popular tasks related to Opinion Mining. Many papers have been presented describing one of the two main approaches used to solve this problem. On the one hand, a supervised methodology uses machine learning algorithms when training data exist. On the other hand, an unsupervised method based on a semantic orientation is applied when linguistic resources are available. However, few studies combine the two approaches. In this paper we propose the use of meta-classifiers that combine supervised and unsupervised learning in order to develop a polarity classification system. We have used a Spanish corpus of film reviews along with its parallel corpus translated into English. Firstly, we generate two individual models using these two corpora and applying machine learning algorithms. Secondly, we integrate SentiWordNet into the English corpus, generating a new unsupervised model. Finally, the three systems are combined using a meta-classifier that allows us to apply several combination algorithms such as voting system or stacking. The results obtained outperform those obtained using the systems individually and show that this approach could be considered a good strategy for polarity classification when we work with parallel corpora.} }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作