laion/ConcatX-M3
收藏Hugging Face2024-09-07 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/laion/ConcatX-M3
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含维基百科文章的嵌入表示,主要特征包括Wiki Language(维基语言)、Embeddings(嵌入向量)和Version Control(版本控制)。数据集分为两个部分:enwiki_embed_concat(英文维基百科嵌入)和dewiki_embed_concat(德文维基百科嵌入),分别包含6,575,217和2,565,263个示例。数据集的下载大小为54,927,875,336字节,总大小为75,262,632,256字节。
This dataset contains embeddings from Wikipedia, divided into English and German versions. Each version includes a language identifier and a sequence of embeddings. The dataset is split into enwiki_embed_concat (English) and dewiki_embed_concat (German), containing 6,575,217 and 2,565,263 examples respectively. The download size of the dataset is 54.93GB, and the dataset size is 75.26GB.
提供机构:
laion



