Alienmaster/omp_sa

Name: Alienmaster/omp_sa
Creator: Alienmaster
Published: 2024-04-12 09:24:04
License: 暂无描述

Hugging Face2024-04-12 更新2024-06-12 收录

下载链接：

https://hf-mirror.com/datasets/Alienmaster/omp_sa

下载链接

链接失效反馈

官方服务：

资源简介：

“一百万帖子”语料库是一个包含奥地利报纸网站用户评论的注释数据集（德语）。此子集仅包含带有情感标签的帖子ID、标题和正文。情感标签被重命名为Positive、Negative和Neutral以便于使用。如果您对完整数据集感兴趣，请使用HuggingFace上的官方数据集。

提供机构：

Alienmaster

原始信息汇总

数据集概述

名称: One Million Posts Corpus - Sentiment Subset
语言: 德语（de）
许可证: Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (cc-by-nc-sa-4.0)
多语言性: 单语种（monolingual）
大小: 1K<n<10K
标签: Sentiment Analysis
任务类别: text-classification

数据集详细信息

数据结构:
- 配置名称: default
- 列名: ["ID_Post","Headline","Body","Category"]
- 数据文件:
  - 分割: full
  - 路径: "full.csv"
数据集内容:
- 包含用户评论的Post IDs、Headlines和Bodys，以及Sentiment标签（重命名为"Positive", "Negative" 和 "Neutral"）。
- 数据来源于奥地利一家报纸网站的用户评论。

引用信息

@InProceedings{Schabus2018, author = {Dietmar Schabus and Marcin Skowron}, title = {Academic-Industrial Perspective on the Development and Deployment of a Moderation System for a Newspaper Website}, booktitle = {Proceedings of the 11th International Conference on Language Resources and Evaluation (LREC)}, year = {2018}, address = {Miyazaki, Japan}, month = may, pages = {1602-1605}, abstract = {This paper describes an approach and our experiences from the development, deployment and usability testing of a Natural Language Processing (NLP) and Information Retrieval system that supports the moderation of user comments on a large newspaper website. We highlight some of the differences between industry-oriented and academic research settings and their influence on the decisions made in the data collection and annotation processes, selection of document representation and machine learning methods. We report on classification results, where the problems to solve and the data to work with come from a commercial enterprise. In this context typical for NLP research, we discuss relevant industrial aspects. We believe that the challenges faced as well as the solutions proposed for addressing them can provide insights to others working in a similar setting.}, url = {http://www.lrec-conf.org/proceedings/lrec2018/summaries/8885.html}, }

5,000+

优质数据集

54 个

任务类型

进入经典数据集