Alienmaster/omp_sa
收藏数据集概述
- 名称: One Million Posts Corpus - Sentiment Subset
- 语言: 德语(de)
- 许可证: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (cc-by-nc-sa-4.0)
- 多语言性: 单语种(monolingual)
- 大小: 1K<n<10K
- 标签: Sentiment Analysis
- 任务类别: text-classification
数据集详细信息
-
数据结构:
- 配置名称: default
- 列名: ["ID_Post","Headline","Body","Category"]
- 数据文件:
- 分割: full
- 路径: "full.csv"
-
数据集内容:
- 包含用户评论的Post IDs、Headlines和Bodys,以及Sentiment标签(重命名为"Positive", "Negative" 和 "Neutral")。
- 数据来源于奥地利一家报纸网站的用户评论。
引用信息
@InProceedings{Schabus2018, author = {Dietmar Schabus and Marcin Skowron}, title = {Academic-Industrial Perspective on the Development and Deployment of a Moderation System for a Newspaper Website}, booktitle = {Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC)}, year = {2018}, address = {Miyazaki, Japan}, month = may, pages = {1602-1605}, abstract = {This paper describes an approach and our experiences from the development, deployment and usability testing of a Natural Language Processing (NLP) and Information Retrieval system that supports the moderation of user comments on a large newspaper website. We highlight some of the differences between industry-oriented and academic research settings and their influence on the decisions made in the data collection and annotation processes, selection of document representation and machine learning methods. We report on classification results, where the problems to solve and the data to work with come from a commercial enterprise. In this context typical for NLP research, we discuss relevant industrial aspects. We believe that the challenges faced as well as the solutions proposed for addressing them can provide insights to others working in a similar setting.}, url = {http://www.lrec-conf.org/proceedings/lrec2018/summaries/8885.html}, }



