Cohere/fineweb-edu-emb
收藏Hugging Face2024-07-03 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Cohere/fineweb-edu-emb
下载链接
链接失效反馈官方服务:
资源简介:
This file contains the embeddings for the full fineweb-edu dataset.
The dataset has been deduplicated (using only exact deduplication).
The emb folder contains for each parquet file a new_{parquet_name}.npy and old_{parquet}.npy file. The old refers to text that has been seen in the smaller 10B/100B/350B data samples.
Cohere embed-multilingual-v3.0 model has been used.
The index of the dataset can be found here:
https://huggingface.co/datasets/Cohere/fineweb-edu-index
The corpus can be found here:
https://huggingface.co/datasets/Cohere/fineweb-edu-corpus
提供机构:
Cohere



