five

Cohere/fineweb-edu-emb

收藏
Hugging Face2024-07-03 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Cohere/fineweb-edu-emb
下载链接
链接失效反馈
官方服务:
资源简介:
This file contains the embeddings for the full fineweb-edu dataset. The dataset has been deduplicated (using only exact deduplication). The emb folder contains for each parquet file a new_{parquet_name}.npy and old_{parquet}.npy file. The old refers to text that has been seen in the smaller 10B/100B/350B data samples. Cohere embed-multilingual-v3.0 model has been used. The index of the dataset can be found here: https://huggingface.co/datasets/Cohere/fineweb-edu-index The corpus can be found here: https://huggingface.co/datasets/Cohere/fineweb-edu-corpus
提供机构:
Cohere
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作