DGurgurov/hebrew_sa

Name: DGurgurov/hebrew_sa
Creator: DGurgurov
Published: 2024-05-30 12:31:04
License: 暂无描述

Hugging Face2024-05-30 更新2024-06-12 收录

下载链接：

https://hf-mirror.com/datasets/DGurgurov/hebrew_sa

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含来自Amram等人（2018）的情感分析数据，主要用于研究在低资源语言中通过图知识改进词嵌入的项目。数据集基于12K条社交媒体评论，提供了基于词和基于词素两种数据实例。研究结果表明，表示选择对任务性能有显著影响，特别是对于希伯来语这种形态丰富的语言。

提供机构：

DGurgurov

原始信息汇总

希伯来语情感分析数据集

数据集描述： 该数据集包含Amram等人（2018）的情感分析数据集。

数据结构： 该数据用于改进低资源语言的图知识词嵌入项目。

引用： bibtex @inproceedings{amram-etal-2018-representations, title = "Representations and Architectures in Neural Sentiment Analysis for Morphologically Rich Languages: A Case Study from {M}odern {H}ebrew", author = "Amram, Adam and Ben David, Anat and Tsarfaty, Reut", editor = "Bender, Emily M. and Derczynski, Leon and Isabelle, Pierre", booktitle = "Proceedings of the 27th International Conference on Computational Linguistics", month = aug, year = "2018", address = "Santa Fe, New Mexico, USA", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/C18-1190", pages = "2242--2252", abstract = "This paper empirically studies the effects of representation choices on neural sentiment analysis for Modern Hebrew, a morphologically rich language (MRL) for which no sentiment analyzer currently exists. We study two dimensions of representational choices: (i) the granularity of the input signal (token-based vs. morpheme-based), and (ii) the level of encoding of vocabulary items (string-based vs. character-based). We hypothesise that for MRLs, languages where multiple meaning-bearing elements may be carried by a single space-delimited token, these choices will have measurable effects on task perfromance, and that these effects may vary for different architectural designs {---} fully-connected, convolutional or recurrent. Specifically, we hypothesize that morpheme-based representations will have advantages in terms of their generalization capacity and task accuracy, due to their better OOV coverage. To empirically study these effects, we develop a new sentiment analysis benchmark for Hebrew, based on 12K social media comments, and provide two instances of these data: in token-based and morpheme-based settings. Our experiments show that representation choices empirical effects vary with architecture type. While fully-connected and convolutional networks slightly prefer token-based settings, RNNs benefit from a morpheme-based representation, in accord with the hypothesis that explicit morphological information may help generalize. Our endeavour also delivers the first state-of-the-art broad-coverage sentiment analyzer for Hebrew, with over 89{%} accuracy, alongside an established benchmark to further study the effects of linguistic representation choices on neural networks{} task performance.", }

5,000+

优质数据集

54 个

任务类型

进入经典数据集