five

DGurgurov/hebrew_sa

收藏
Hugging Face2024-05-30 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/DGurgurov/hebrew_sa
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集包含来自Amram等人(2018)的情感分析数据,主要用于研究在低资源语言中通过图知识改进词嵌入的项目。数据集基于12K条社交媒体评论,提供了基于词和基于词素两种数据实例。研究结果表明,表示选择对任务性能有显著影响,特别是对于希伯来语这种形态丰富的语言。

该数据集包含来自Amram等人(2018)的情感分析数据,主要用于研究在低资源语言中通过图知识改进词嵌入的项目。数据集基于12K条社交媒体评论,提供了基于词和基于词素两种数据实例。研究结果表明,表示选择对任务性能有显著影响,特别是对于希伯来语这种形态丰富的语言。
提供机构:
DGurgurov
原始信息汇总

希伯来语情感分析数据集

数据集描述: 该数据集包含Amram等人(2018)的情感分析数据集。

数据结构: 该数据用于改进低资源语言的图知识词嵌入项目。

引用: bibtex @inproceedings{amram-etal-2018-representations, title = "Representations and Architectures in Neural Sentiment Analysis for Morphologically Rich Languages: A Case Study from {M}odern {H}ebrew", author = "Amram, Adam and Ben David, Anat and Tsarfaty, Reut", editor = "Bender, Emily M. and Derczynski, Leon and Isabelle, Pierre", booktitle = "Proceedings of the 27th International Conference on Computational Linguistics", month = aug, year = "2018", address = "Santa Fe, New Mexico, USA", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/C18-1190", pages = "2242--2252", abstract = "This paper empirically studies the effects of representation choices on neural sentiment analysis for Modern Hebrew, a morphologically rich language (MRL) for which no sentiment analyzer currently exists. We study two dimensions of representational choices: (i) the granularity of the input signal (token-based vs. morpheme-based), and (ii) the level of encoding of vocabulary items (string-based vs. character-based). We hypothesise that for MRLs, languages where multiple meaning-bearing elements may be carried by a single space-delimited token, these choices will have measurable effects on task perfromance, and that these effects may vary for different architectural designs {---} fully-connected, convolutional or recurrent. Specifically, we hypothesize that morpheme-based representations will have advantages in terms of their generalization capacity and task accuracy, due to their better OOV coverage. To empirically study these effects, we develop a new sentiment analysis benchmark for Hebrew, based on 12K social media comments, and provide two instances of these data: in token-based and morpheme-based settings. Our experiments show that representation choices empirical effects vary with architecture type. While fully-connected and convolutional networks slightly prefer token-based settings, RNNs benefit from a morpheme-based representation, in accord with the hypothesis that explicit morphological information may help generalize. Our endeavour also delivers the first state-of-the-art broad-coverage sentiment analyzer for Hebrew, with over 89{%} accuracy, alongside an established benchmark to further study the effects of linguistic representation choices on neural networks{} task performance.", }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作