nomic-embed-text-v1
收藏arXiv2025-09-30 收录
下载链接:
https://atlas.nomic.ai/map/nomic-text-embed-v1-5m-sample
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了用于文本嵌入的对立训练对,经过质量过滤后,处理后的相似对数量约为2.35亿。该数据集的规模达到了500万个样本,其任务是进行文本嵌入训练。
This dataset includes contrastive training pairs for text embedding. Following quality filtering, the number of processed similar pairs reaches approximately 235 million. This dataset contains 5 million samples, and it is designed for text embedding training.
提供机构:
Nomic AI



