shunk031/wrime

Name: shunk031/wrime
Creator: shunk031
Published: 2023-01-15 03:39:01
License: 暂无描述

Hugging Face2023-01-15 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/shunk031/wrime

下载链接

链接失效反馈

官方服务：

资源简介：

--- annotations_creators: - crowdsourced language: - ja language_creators: - crowdsourced license: - unknown multilinguality: - monolingual pretty_name: wrime tags: - sentiment-analysis - wrime task_categories: - text-classification task_ids: - sentiment-classification datasets: - ver1 - ver2 metrics: - accuracy --- # Dataset Card for WRIME [![CI](https://github.com/shunk031/huggingface-datasets_wrime/actions/workflows/ci.yaml/badge.svg)](https://github.com/shunk031/huggingface-datasets_wrime/actions/workflows/ci.yaml) ## Table of Contents - [Table of Contents](#table-of-contents) - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - Homepage: https://github.com/ids-cv/wrime - Repository: https://github.com/shunk031/huggingface-datasets_wrime - Paper: https://aclanthology.org/2021.naacl-main.169/ ### Dataset Summary In this study, we introduce a new dataset, WRIME, for emotional intensity estimation. We collect both the subjective emotional intensity ofthe writers themselves and the objective one annotated by the readers, and explore the differences between them. In our data collection, we hired 50 participants via crowdsourcing service. They annotated their own past posts on a social networking service (SNS) with the subjective emotional intensity. We also hired 3 annotators, who annotated allposts with the objective emotional intensity. Consequently, our Japanese emotion analysis datasetconsists of 17,000 posts with both subjective andobjective emotional intensities for Plutchik’s eightemotions ([Plutchik, 1980](https://www.sciencedirect.com/science/article/pii/B9780125587013500077)), which are given in afour-point scale (no, weak, medium, and strong). ### Supported Tasks and Leaderboards [More Information Needed] ### Languages - Japanese ## Dataset Structure ### Data Instances When loading a specific configuration, users has to append a version dependent suffix: ```python from datasets import load_dataset dataset = load_dataset("shunk031/wrime", name="ver1") print(dataset) # DatasetDict({ # train: Dataset({ # features: ['sentence', 'user_id', 'datetime', 'writer', 'reader1', 'reader2', 'reader3', 'avg_readers'], # num_rows: 40000 # }) # validation: Dataset({ # features: ['sentence', 'user_id', 'datetime', 'writer', 'reader1', 'reader2', 'reader3', 'avg_readers'], # num_rows: 1200 # }) # test: Dataset({ # features: ['sentence', 'user_id', 'datetime', 'writer', 'reader1', 'reader2', 'reader3', 'avg_readers'], # num_rows: 2000 # }) # }) ``` #### Ver. 1 An example of looks as follows: ```json { "sentence": "ぼけっとしてたらこんな時間｡チャリあるから食べにでたいのに…", "user_id": "1", "datetime": "2012/07/31 23:48", "writer": { "joy": 0, "sadness": 1, "anticipation": 2, "surprise": 1, "anger": 1, "fear": 0, "disgust": 0, "trust": 1 }, "reader1": { "joy": 0, "sadness": 2, "anticipation": 0, "surprise": 0, "anger": 0, "fear": 0, "disgust": 0, "trust": 0 }, "reader2": { "joy": 0, "sadness": 2, "anticipation": 0, "surprise": 1, "anger": 0, "fear": 0, "disgust": 0, "trust": 0 }, "reader3": { "joy": 0, "sadness": 2, "anticipation": 0, "surprise": 0, "anger": 0, "fear": 1, "disgust": 1, "trust": 0 }, "avg_readers": { "joy": 0, "sadness": 2, "anticipation": 0, "surprise": 0, "anger": 0, "fear": 0, "disgust": 0, "trust": 0 } } ``` #### Ver. 1 An example of looks as follows: ```json { "sentence": "ぼけっとしてたらこんな時間。チャリあるから食べにでたいのに…", "user_id": "1", "datetime": "2012/7/31 23:48", "writer": { "joy": 0, "sadness": 1, "anticipation": 2, "surprise": 1, "anger": 1, "fear": 0, "disgust": 0, "trust": 1, "sentiment": 0 }, "reader1": { "joy": 0, "sadness": 2, "anticipation": 0, "surprise": 0, "anger": 0, "fear": 0, "disgust": 0, "trust": 0, "sentiment": -2 }, "reader2": { "joy": 0, "sadness": 2, "anticipation": 0, "surprise": 0, "anger": 0, "fear": 1, "disgust": 1, "trust": 0, "sentiment": -1 }, "reader3": { "joy": 0, "sadness": 2, "anticipation": 0, "surprise": 1, "anger": 0, "fear": 0, "disgust": 0, "trust": 0, "sentiment": -1 }, "avg_readers": { "joy": 0, "sadness": 2, "anticipation": 0, "surprise": 0, "anger": 0, "fear": 0, "disgust": 0, "trust": 0, "sentiment": -1 } } ``` ### Data Fields #### Ver. 1 - `sentence`: 投稿テキスト - `user_id`: ユーザー ID - `datetime`: 投稿日時 - `writer`: 主観（書き手） - `joy`: 主観の喜びの感情 - `sadness`: 主観の悲しみの感情 - `anticipation`: 主観の期待の感情 - `surprise`: 主観の驚きの感情 - `anger`: 主観の怒りの感情 - `fear`: 主観の恐れの感情 - `disgust`: 主観の嫌悪の感情 - `trust`: 主観の信頼の感情 - `reader1`: 客観 A （読み手 A) - `joy`: 客観 A の喜びの感情 - `sadness`: 客観 A の悲しみの感情 - `anticipation`: 客観 A の期待の感情 - `surprise`: 客観 A の驚きの感情 - `anger`: 客観 A の怒りの感情 - `fear`: 客観 A の恐れの感情 - `disgust`: 客観 A の嫌悪の感情 - `trust`: 客観 A の信頼の感情 - `reader2`: 客観 B （読み手 B) - `joy`: 客観 B の喜びの感情 - `sadness`: 客観 B の悲しみの感情 - `anticipation`: 客観 B の期待の感情 - `surprise`: 客観 B の驚きの感情 - `anger`: 客観 B の怒りの感情 - `fear`: 客観 B の恐れの感情 - `disgust`: 客観 B の嫌悪の感情 - `trust`: 客観 B の信頼の感情 - `reader3`: 客観 C （読み手 C) - `joy`: 客観 C の喜びの感情 - `sadness`: 客観 C の悲しみの感情 - `anticipation`: 客観 C の期待の感情 - `surprise`: 客観 C の驚きの感情 - `anger`: 客観 C の怒りの感情 - `fear`: 客観 C の恐れの感情 - `disgust`: 客観 C の嫌悪の感情 - `trust`: 客観 C の信頼の感情 - `avg_readers` - `joy`: 客観 A, B, C 平均の喜びの感情 - `sadness`: 客観 A, B, C 平均の悲しみの感情 - `anticipation`: 客観 A, B, C 平均の期待の感情 - `surprise`: 客観 A, B, C 平均の驚きの感情 - `anger`: 客観 A, B, C 平均の怒りの感情 - `fear`: 客観 A, B, C 平均の恐れの感情 - `disgust`: 客観 A, B, C 平均の嫌悪の感情 - `trust`: 客観 A, B, C 平均の信頼の感情 #### Ver. 2 - `sentence`: 投稿テキスト - `user_id`: ユーザー ID - `datetime`: 投稿日時 - `writer`: 主観（書き手） - `joy`: 主観の喜びの感情 - `sadness`: 主観の悲しみの感情 - `anticipation`: 主観の期待の感情 - `surprise`: 主観の驚きの感情 - `anger`: 主観の怒りの感情 - `fear`: 主観の恐れの感情 - `disgust`: 主観の嫌悪の感情 - `trust`: 主観の信頼の感情 - `sentiment`: 主観の感情極性 - `reader1`: 客観 A （読み手 A) - `joy`: 客観 A の喜びの感情 - `sadness`: 客観 A の悲しみの感情 - `anticipation`: 客観 A の期待の感情 - `surprise`: 客観 A の驚きの感情 - `anger`: 客観 A の怒りの感情 - `fear`: 客観 A の恐れの感情 - `disgust`: 客観 A の嫌悪の感情 - `trust`: 客観 A の信頼の感情 - `sentiment`: 客観 A の感情極性 - `reader2`: 客観 B （読み手 B) - `joy`: 客観 B の喜びの感情 - `sadness`: 客観 B の悲しみの感情 - `anticipation`: 客観 B の期待の感情 - `surprise`: 客観 B の驚きの感情 - `anger`: 客観 B の怒りの感情 - `fear`: 客観 B の恐れの感情 - `disgust`: 客観 B の嫌悪の感情 - `trust`: 客観 B の信頼の感情 - `sentiment`: 客観 B の感情極性 - `reader3`: 客観 C （読み手 C) - `joy`: 客観 C の喜びの感情 - `sadness`: 客観 C の悲しみの感情 - `anticipation`: 客観 C の期待の感情 - `surprise`: 客観 C の驚きの感情 - `anger`: 客観 C の怒りの感情 - `fear`: 客観 C の恐れの感情 - `disgust`: 客観 C の嫌悪の感情 - `trust`: 客観 C の信頼の感情 - `sentiment`: 客観 C の感情極性 - `avg_readers` - `joy`: 客観 A, B, C 平均の喜びの感情 - `sadness`: 客観 A, B, C 平均の悲しみの感情 - `anticipation`: 客観 A, B, C 平均の期待の感情 - `surprise`: 客観 A, B, C 平均の驚きの感情 - `anger`: 客観 A, B, C 平均の怒りの感情 - `fear`: 客観 A, B, C 平均の恐れの感情 - `disgust`: 客観 A, B, C 平均の嫌悪の感情 - `trust`: 客観 A, B, C 平均の信頼の感情 - `sentiment`: 客観 A, B, C 平均の感情極性 ### Data Splits | name | train | validation | test | |------|-------:|-----------:|------:| | ver1 | 40,000 | 1,200 | 2,000 | | ver2 | 30,000 | 2,500 | 2,500 | ## Dataset Creation ### Curation Rationale [More Information Needed] ### Source Data #### Initial Data Collection and Normalization [More Information Needed] #### Who are the source language producers? [More Information Needed] ### Annotations #### Annotation process [More Information Needed] #### Who are the annotators? [More Information Needed] ### Personal and Sensitive Information [More Information Needed] ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed] ### Discussion of Biases [More Information Needed] ### Other Known Limitations [More Information Needed] ## Additional Information ### Dataset Curators [More Information Needed] ### Licensing Information From [the README](https://github.com/ids-cv/wrime/blob/master/README.en.md#licence) of the GitHub: - The dataset is available for research purposes only. - Redistribution of the dataset is prohibited. ### Citation Information ```bibtex @inproceedings{kajiwara-etal-2021-wrime, title = "{WRIME}: A New Dataset for Emotional Intensity Estimation with Subjective and Objective Annotations", author = "Kajiwara, Tomoyuki and Chu, Chenhui and Takemura, Noriko and Nakashima, Yuta and Nagahara, Hajime", booktitle = "Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies", month = jun, year = "2021", address = "Online", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2021.naacl-main.169", doi = "10.18653/v1/2021.naacl-main.169", pages = "2095--2104", abstract = "We annotate 17,000 SNS posts with both the writer{'}s subjective emotional intensity and the reader{'}s objective one to construct a Japanese emotion analysis dataset. In this study, we explore the difference between the emotional intensity of the writer and that of the readers with this dataset. We found that the reader cannot fully detect the emotions of the writer, especially anger and trust. In addition, experimental results in estimating the emotional intensity show that it is more difficult to estimate the writer{'}s subjective labels than the readers{'}. The large gap between the subjective and objective emotions imply the complexity of the mapping from a post to the subjective emotion intensities, which also leads to a lower performance with machine learning models.", } ``` ```bibtex @inproceedings{suzuki-etal-2022-japanese, title = "A {J}apanese Dataset for Subjective and Objective Sentiment Polarity Classification in Micro Blog Domain", author = "Suzuki, Haruya and Miyauchi, Yuto and Akiyama, Kazuki and Kajiwara, Tomoyuki and Ninomiya, Takashi and Takemura, Noriko and Nakashima, Yuta and Nagahara, Hajime", booktitle = "Proceedings of the Thirteenth Language Resources and Evaluation Conference", month = jun, year = "2022", address = "Marseille, France", publisher = "European Language Resources Association", url = "https://aclanthology.org/2022.lrec-1.759", pages = "7022--7028", abstract = "We annotate 35,000 SNS posts with both the writer{'}s subjective sentiment polarity labels and the reader{'}s objective ones to construct a Japanese sentiment analysis dataset. Our dataset includes intensity labels (\textit{none}, \textit{weak}, \textit{medium}, and \textit{strong}) for each of the eight basic emotions by Plutchik (\textit{joy}, \textit{sadness}, \textit{anticipation}, \textit{surprise}, \textit{anger}, \textit{fear}, \textit{disgust}, and \textit{trust}) as well as sentiment polarity labels (\textit{strong positive}, \textit{positive}, \textit{neutral}, \textit{negative}, and \textit{strong negative}). Previous studies on emotion analysis have studied the analysis of basic emotions and sentiment polarity independently. In other words, there are few corpora that are annotated with both basic emotions and sentiment polarity. Our dataset is the first large-scale corpus to annotate both of these emotion labels, and from both the writer{'}s and reader{'}s perspectives. In this paper, we analyze the relationship between basic emotion intensity and sentiment polarity on our dataset and report the results of benchmarking sentiment polarity classification.", } ``` ### Contributions Thanks to [@moguranosenshi](https://github.com/moguranosenshi) for creating this dataset.

提供机构：

shunk031

原始信息汇总

数据集概述

数据集名称

名称: WRIME
别名: 无

数据集属性

语言: 日语
多语言性: 单语种
许可证: 未知
标签: 情感分析, WRIME
任务类别: 文本分类
任务ID: 情感分类

数据集版本

版本: ver1, ver2

数据集度量标准

度量标准: 准确性

数据集描述

摘要: WRIME是一个用于情感强度估计的新数据集，收集了作者的主观情感强度和读者的客观情感强度，并探索两者之间的差异。数据集包含17,000篇帖子，涵盖了Plutchik的八种情感，采用四点量表（无、弱、中、强）进行标注。

数据集结构

数据实例: 数据集包含训练、验证和测试集，每个版本的数据集结构略有不同，但主要包含句子、用户ID、时间戳、作者和读者的情感强度等字段。
数据字段: 包括句子、用户ID、时间戳、作者和读者的情感强度等，具体字段根据版本有所不同。
数据分割: 不同版本的数据集在训练、验证和测试集的规模上有所差异。

数据集创建

注释者: 通过众包服务雇佣的参与者
注释过程: 参与者标注自己的社交网络服务(SNS)帖子，同时雇佣的注释者标注所有帖子。

使用数据集的考虑

许可证信息: 数据集仅供研究目的使用，禁止重新分发。
引用信息: 提供了详细的引用信息，包括作者、出版年份、标题等。

贡献者

创建者: @moguranosenshi

搜集汇总

数据集介绍

构建方式

在情感计算领域，构建高质量的情感强度估计数据集对于理解文本中复杂的情感表达至关重要。WRIME数据集的构建过程体现了严谨的学术方法，其通过众包服务招募了50名参与者，收集他们在社交网络服务上的历史帖子，并由作者本人对帖子进行主观情感强度标注。同时，数据集还聘请了三位独立的标注者，对所有帖子进行客观情感强度标注。最终，该数据集汇集了17,000条日语帖子，每条帖子均标注了基于普拉切克八种基本情感的主观与客观情感强度，强度分为无、弱、中、强四个等级，为情感分析研究提供了宝贵的双视角标注资源。

特点

WRIME数据集的显著特点在于其独特的双视角情感标注体系，同时捕捉了作者的主观情感与读者的客观感知。该数据集严格遵循普拉切克的情感轮模型，对喜悦、悲伤、期待、惊讶、愤怒、恐惧、厌恶和信任八种基本情感进行了细致的强度分级。数据集包含两个版本，其中第二版进一步引入了情感极性标签，丰富了情感分析的维度。其数据规模庞大，提供了训练集、验证集和测试集的明确划分，确保了模型训练与评估的可靠性，为探究主观与客观情感之间的差异提供了扎实的数据基础。

使用方法

在自然语言处理研究中，WRIME数据集主要用于情感强度估计与情感分类任务。研究人员可通过Hugging Face的`datasets`库便捷加载数据集，并需指定版本（如`ver1`或`ver2`）。数据实例包含文本句子、用户ID、时间戳以及详细的情感标注字典。典型的使用方法包括利用`writer`字段研究作者意图情感，或综合`reader1`、`reader2`、`reader3`及`avg_readers`字段分析读者群体感知，进而训练模型以探索主客观情感的一致性、差异及其预测难度，推动情感理解模型的发展。

背景与挑战

背景概述

情感计算领域长期致力于解析文本中蕴含的复杂情绪，而日语情感分析数据集相对稀缺。WRIME数据集于2021年由日本研究团队构建，核心成员包括Kajiwara Tomoyuki等人，其研究发表于北美计算语言学协会年会。该数据集旨在探究社交媒体文本中作者主观情感强度与读者客观感知之间的差异，依据Plutchik的八种基本情绪理论，对17,000条日语帖子进行了四维强度标注。这一创新性设计为情感分析模型提供了同时捕捉内在表达与外部解读的双重视角，显著推动了跨文化语境下情感理解的研究进展。

当前挑战

WRIME数据集致力于解决情感强度估计这一复杂任务，其核心挑战在于建模作者主观情感与读者客观感知之间的固有差异。数据表明，读者难以完全识别作者的情感，尤其在愤怒与信任等维度上存在显著感知鸿沟，这导致机器学习模型在预测主观情感标签时面临更高难度。在构建过程中，挑战主要源于标注一致性保障，需协调50名作者的自评与3名读者的他评，确保跨标注者间信度；同时，处理日语社交媒体文本的非正式表达与语境模糊性，亦对情感强度的精确量化构成障碍。

常用场景

经典使用场景

在情感计算领域，WRIME数据集为研究者提供了探索主观与客观情感强度差异的宝贵资源。该数据集收录了来自社交网络服务的日语帖子，每条帖子均标注了作者自身的主观情感强度以及多位读者的客观情感强度，覆盖了普拉切克八种基本情绪的四级强度标注。这一设计使得WRIME成为情感强度估计任务的经典基准，尤其适用于开发和评估能够区分表达者内在情感与接收者感知情感的复杂计算模型。

实际应用

在实际应用层面，WRIME数据集为需要精细情感洞察的日语自然语言处理系统提供了训练和评估依据。基于该数据集开发的模型，可应用于社交媒体舆情监控，不仅识别公众情绪的总体倾向，更能评估信息发布者的真实情感状态与公众解读之间的偏差，为危机公关和心理健康监测提供更精准的参考。此外，在个性化推荐系统和人机对话系统中，此类模型有助于生成更具共情能力的回应，提升用户体验与服务的人性化水平。

衍生相关工作

围绕WRIME数据集，已衍生出一系列聚焦于日语细粒度情感分析的重要研究工作。其开创性的双视角标注框架，直接启发了后续研究对情感主观性的深入探讨，例如基于该数据集的基线模型性能评估，揭示了机器学习模型在预测主观情感标签时面临的更大挑战。相关研究进一步扩展了数据集的标注维度，如引入情感极性标签，并在此基础上进行了情感强度与极性之间关系的联合分析，推动了多任务学习框架在情感计算中的应用。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集