five

GloVe: Global Vectors for Word Representation

收藏
www.kaggle.com2025-03-24 收录
下载链接:
https://www.kaggle.com/rtatman/glove-global-vectors-for-word-representation
下载链接
链接失效反馈
官方服务:
资源简介:
### Context GloVe is an unsupervised learning algorithm for obtaining vector representations for words. Training is performed on aggregated global word-word co-occurrence statistics from a corpus, and the resulting representations showcase interesting linear substructures of the word vector space. ### Content This dataset contains English word vectors pre-trained on the combined Wikipedia 2014 + Gigaword 5th Edition corpora (6B tokens, 400K vocab). All tokens are in lowercase. This dataset contains 50-dimensional, 100-dimensional and 200-dimensional pre trained word vectors. For 300-dimensional word vectors and additional information, please see the [project website][1]. ### Acknowledgements This data has been released under the [Open Data Commons Public Domain Dedication and License][2]. If you use this dataset in your work, please cite the following paper: > Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation. URL: https://nlp.stanford.edu/pubs/glove.pdf ### Inspiration GloVe embeddings have been used in more than 2100 papers, and counting! You can use these pre-trained embeddings whenever you need a way to quantify word co-occurrence (which also captures some aspects of word meaning.) [1]: https://nlp.stanford.edu/projects/glove/ [2]: https://opendatacommons.org/licenses/pddl/

{'Context': 'GloVe乃一种无监督学习算法,旨在获取词汇的向量表示。其训练过程基于从语料库中聚合的全局词-词共现统计数据,并生成的表示展现了词汇向量空间中有趣的线性子结构。', 'Content': '本数据集包含在结合了2014年Wikipedia及第五版Gigaword语料库(6亿个token,40万个词汇)上预训练的英文词汇向量。所有token均转换为小写。本数据集包含50维、100维和200维的预训练词汇向量。关于300维词汇向量及更多信息,请参阅[项目网站][1]。', 'Acknowledgements': '本数据集已根据[开放数据公共领域奉献许可][2]发布。若您在研究中使用本数据集,请引用以下论文: > Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation. URL: https://nlp.stanford.edu/pubs/glove.pdf', 'Inspiration': 'GloVe嵌入已被超过2100篇论文采用,且使用案例仍在不断增加!您可以在需要量化词共现(同时捕捉词义某些方面)的情况下使用这些预训练嵌入。'}
提供机构:
www.kaggle.com
搜集汇总
背景与挑战
背景概述
GloVe是一个基于无监督学习的词向量表示数据集,使用Wikipedia和Gigaword语料库的全局词共现统计训练,提供50、100和200维的预训练英文词向量,词汇量达400K,并具有线性子结构特性。该数据集已公开授权,广泛应用于自然语言处理任务中。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作