Sentiment Lexicons for 81 Languages
收藏www.kaggle.com2017-09-13 更新2025-01-21 收录
下载链接:
https://www.kaggle.com/rtatman/sentiment-lexicons-for-81-languages
下载链接
链接失效反馈官方服务:
资源简介:
### Context:
Sentiment analysis, the task of automatically detecting whether a piece of text is positive or negative, generally relies on a hand-curated list of words with positive sentiment (good, great, awesome) and negative sentiment (bad, gross, awful). This dataset contains both positive and negative sentiment lexicons for 81 languages.
### Content:
The sentiment lexicons in this dataset were generated via graph propagation based on a knowledge graph--a graphical representation of real-world entities and the links between them. The general intuition is that words which are closely linked on a knowledge graph probably have similar sentiment polarities. For this project, sentiments were generated based on English sentiment lexicons.
This dataset contains sentiment lexicons for the following languages:
* Afrikaans
* Albanian
* Arabic
* Aragonese
* Armenian
* Azerbaijani
* Basque
* Belarusian
* Bengali
* Bosnian
* Breton
* Bulgarian
* Catalan
* Chinese
* Croatian
* Czech
* Danish
* Dutch
* Esperanto
* Estonian
* Faroese
* Finnish
* French
* Galician
* Georgian
* German
* Greek
* Gujarati
* Haitian Creole
* Hebrew
* Hindi
* Hungarian
* Icelandic
* Ido
* Indonesian
* Interlingua
* Irish
* Italian
* Kannada
* Khmer
* Kirghiz
* Korean
* Kurdish
* Latin
* Latvian
* Lithuanian
* Luxembourgish
* Macedonian
* Malay
* Maltese
* Marathi
* Norwegian
* Norwegian
* Persian
* Polish
* Portuguese
* Romanian
* Romansh
* Russian
* Scottish
* Serbian
* Slovak
* Slovene
* Spanish
* Swahili
* Swedish
* Tagalog
* Tamil
* Telugu
* Thai
* Turkish
* Turkmen
* Ukrainian
* Urdu
* Uzbek
* Vietnamese
* Volapük
* Walloon
* Welsh
* Western Frisian
* Yiddish
For more information and additional sentiment lexicons, please visit [the project’s website](https://sites.google.com/site/datascienceslab/projects/multilingualsentiment).
### Acknowledgements:
This dataset was collected by Yanqing Chen and Steven Skiena. If you use it in your work, please cite the following paper:
Chen, Y., & Skiena, S. (2014). Building Sentiment Lexicons for All Major Languages. In ACL (2) (pp. 383-389).
It is distributed here under the [GNU General Public License](http://www.gnu.org/licenses/gpl-3.0.html). Note that this is the full GPL, which allows many free uses, but does not allow its incorporation into any type of distributed proprietary software, even in part or in translation. For commercial applications please contact the dataset creators.
### Inspiration:
* These word lists contain many words with similar meanings. Can you automatically detect which words are [cognates](https://en.wikipedia.org/wiki/Cognate)?
* Can you use these sentiment lexicons to reverse-engineer the knowledge graphs that generated them?
{'Context': '情感分析,即自动检测文本是否具有积极或消极倾向的任务,通常依赖于一组人工编制的包含积极情感(如:优秀、卓越、棒极)和消极情感(如:糟糕、恶心、糟糕透顶)的词汇列表。本数据集包含81种语言的积极和消极情感词汇表。', 'Content': '本数据集中的情感词汇表是通过基于知识图谱的图传播方法生成的。知识图谱是对现实世界中实体及其之间链接的图形表示。一般而言,在知识图谱上紧密相连的词汇很可能具有相似的语义极性。在本项目中,情感是基于英语情感词汇表生成的。
本数据集包含以下语言的情感词汇表:
* 阿拉伯语
* 阿尔巴尼亚语
* 阿塞拜疆语
* 巴斯克语
* 白俄罗斯语
* 孟加拉语
* 波斯尼亚语
* 布列塔尼语
* 保加利亚语
* 加泰罗尼亚语
* 中文
* 克罗地亚语
* 捷克语
* 丹麦语
* 荷兰语
* 世界语
* 爱沙尼亚语
* 法罗语
* 芬兰语
* 法语
* 加利西亚语
* 格鲁吉亚语
* 德语
* 希腊语
* 古吉拉特语
* 海地克里奥尔语
* 希伯来语
* 印地语
* 匈牙利语
* 冰岛语
* 爱多语
* 印度尼西亚语
* 国际语
* 爱尔兰语
* 意大利语
* 卡纳达语
* 老挝语
* 吉尔吉斯语
* 韩语
* 库尔德语
* 拉丁语
* 拉脱维亚语
* 立陶宛语
* 卢森堡语
* 马其顿语
* 马来语
* 马耳他语
* 马拉地语
* 挪威语
* 挪威语
* 波斯语
* 波兰语
* 葡萄牙语
* 罗马尼亚语
* 罗曼什语
* 俄语
* 苏格兰语
* 塞尔维亚语
* 斯洛伐克语
* 斯洛文尼亚语
* 西班牙语
* 斯瓦希里语
* 瑞典语
* 他加禄语
* 泰米尔语
* 泰卢固语
* 泰语
* 土耳其语
* 土库曼语
* 乌克兰语
* 乌尔都语
* 乌兹别克语
* 越南语
* 瓦隆语
* 威尔士语
* 西弗里西兰德语
* 约德语
欲了解更多信息和额外的情感词汇表,请访问[项目网站](https://sites.google.com/site/datascienceslab/projects/multilingualsentiment).
### 致谢:
本数据集由Yanqing Chen和Steven Skiena收集。若您在使用过程中引用了本数据集,请引用以下论文:
Chen, Y., & Skiena, S. (2014). Building Sentiment Lexicons for All Major Languages. In ACL (2) (pp. 383-389).
本数据集在此依据[GNU通用公共许可证](http://www.gnu.org/licenses/gpl-3.0.html)分发。请注意,这是完整的GPL许可证,它允许许多免费用途,但不允许将其纳入任何类型的分布式专有软件,即使部分或翻译也不允许。对于商业应用,请联系数据集创建者。
### 灵感:
* 这些词汇列表中包含许多具有相似意义的词汇。您能自动检测哪些词汇是[同源词](https://en.wikipedia.org/wiki/Cognate)吗?
* 您能使用这些情感词汇表来逆向工程生成它们的知识图谱吗?'}
提供机构:
Kaggle



