efarrall/word_embeddings

Name: efarrall/word_embeddings
Creator: efarrall
Published: 2024-04-03 15:17:35
License: 暂无描述

Hugging Face2024-04-03 更新2024-06-11 收录

下载链接：

https://hf-mirror.com/datasets/efarrall/word_embeddings

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit --- This dataset contains the embeddings of 8000 words pulled from the wonderwords package (https://pypi.org/project/wonderwords/). Embedding model: "text-embedding-3-large" from OpenAI The word embeddings are stored as a dataframe; to read data run: df_word_embeds = pd.read_pickle("8000words.pkl") The words used in the embeddings are stored in word_list as a json; to read data run: with open("word_list", "r") as f: ... word_list = json.load(f) Indexes of word_list match indexes of df_word_embeds

提供机构：

efarrall

原始信息汇总

数据集概述

数据集内容

数据集包含：8000个单词的嵌入（embeddings）。
单词来源：来自wonderwords包，可通过https://pypi.org/project/wonderwords/访问。

嵌入模型

模型名称：text-embedding-3-large。
提供方：OpenAI。

数据存储

嵌入数据：存储为pandas数据帧，文件名为8000words.pkl。
读取方法：使用pd.read_pickle("8000words.pkl")读取。
单词列表：存储为JSON格式，文件名为word_list。
读取方法：使用json.load(f)读取，其中f为打开的文件对象。

数据结构

索引对应关系：单词列表的索引与嵌入数据帧的索引一一对应。

许可证

许可证类型：MIT许可证。

5,000+

优质数据集

54 个

任务类型

进入经典数据集