lightonai/embeddings-pre-training
收藏Hugging Face2026-04-16 更新2025-08-30 收录
下载链接:
https://hf-mirror.com/datasets/lightonai/embeddings-pre-training
下载链接
链接失效反馈官方服务:
资源简介:
这是一个大型的文本嵌入模型预训练数据集,包含多样化的对比数据,旨在开发最先进的文本嵌入模型。该数据集主要以英语为主,同时包含几个法语数据集,以支持双语文本和跨语言研究。
This is a large-scale dataset of diverse, contrastive pre-training data for developing state-of-the-art text embedding models. The collection is primarily in English, with the inclusion of several French datasets to facilitate bilingual and cross-lingual research.
提供机构:
lightonai



