five

Alienmaster/omp_sa

收藏
Hugging Face2024-04-12 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/Alienmaster/omp_sa
下载链接
链接失效反馈
官方服务:
资源简介:
“一百万帖子”语料库是一个包含奥地利报纸网站用户评论的注释数据集(德语)。此子集仅包含带有情感标签的帖子ID、标题和正文。情感标签被重命名为Positive、Negative和Neutral以便于使用。如果您对完整数据集感兴趣,请使用HuggingFace上的官方数据集。

“一百万帖子”语料库是一个包含奥地利报纸网站用户评论的注释数据集(德语)。此子集仅包含带有情感标签的帖子ID、标题和正文。情感标签被重命名为Positive、Negative和Neutral以便于使用。如果您对完整数据集感兴趣,请使用HuggingFace上的官方数据集。
提供机构:
Alienmaster
原始信息汇总

数据集概述

  • 名称: One Million Posts Corpus - Sentiment Subset
  • 语言: 德语(de)
  • 许可证: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (cc-by-nc-sa-4.0)
  • 多语言性: 单语种(monolingual)
  • 大小: 1K<n<10K
  • 标签: Sentiment Analysis
  • 任务类别: text-classification

数据集详细信息

  • 数据结构:

    • 配置名称: default
    • 列名: ["ID_Post","Headline","Body","Category"]
    • 数据文件:
      • 分割: full
      • 路径: "full.csv"
  • 数据集内容:

    • 包含用户评论的Post IDs、Headlines和Bodys,以及Sentiment标签(重命名为"Positive", "Negative" 和 "Neutral")。
    • 数据来源于奥地利一家报纸网站的用户评论。

引用信息

@InProceedings{Schabus2018, author = {Dietmar Schabus and Marcin Skowron}, title = {Academic-Industrial Perspective on the Development and Deployment of a Moderation System for a Newspaper Website}, booktitle = {Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC)}, year = {2018}, address = {Miyazaki, Japan}, month = may, pages = {1602-1605}, abstract = {This paper describes an approach and our experiences from the development, deployment and usability testing of a Natural Language Processing (NLP) and Information Retrieval system that supports the moderation of user comments on a large newspaper website. We highlight some of the differences between industry-oriented and academic research settings and their influence on the decisions made in the data collection and annotation processes, selection of document representation and machine learning methods. We report on classification results, where the problems to solve and the data to work with come from a commercial enterprise. In this context typical for NLP research, we discuss relevant industrial aspects. We believe that the challenges faced as well as the solutions proposed for addressing them can provide insights to others working in a similar setting.}, url = {http://www.lrec-conf.org/proceedings/lrec2018/summaries/8885.html}, }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作