stephantulkens/msmarco-query-gte-modernbert-pooled

Name: stephantulkens/msmarco-query-gte-modernbert-pooled
Creator: stephantulkens
Published: 2025-10-29 19:51:10
License: 暂无描述

Hugging Face2025-10-29 更新2025-11-15 收录

下载链接：

https://hf-mirror.com/datasets/stephantulkens/msmarco-query-gte-modernbert-pooled

下载链接

链接失效反馈

官方服务：

资源简介：

这是一个使用阿里巴巴NLP的gte-modernbert-base模型进行嵌入的sentence-transformers/msmarco-corpus数据集。每个文本示例直接嵌入，没有额外的指令。嵌入的维度为768，适用于大规模蒸馏、检索和相似性搜索等任务。数据集包含一个包含1,010,916个示例的训练分割。README还包括有关用于嵌入的模型、数据集模式以及如何处理嵌入的说明。此外，还有对Mixedbread AI提供的GPU资助的感谢。

This dataset is based on the sentence-transformers/msmarco-corpus, embedded with the Alibaba-NLP/gte-modernbert-base model. Each text example is directly embedded without additional instructions. The embeddings have a dimensionality of 768 and are suitable for tasks such as large-scale distillation, retrieval, and similarity search. The dataset contains a training split with 1,010,916 examples. The README includes information about the model used for embedding, the schema of the dataset, and notes on how to handle the embeddings. Additionally, there are acknowledgments for a GPU grant provided by Mixedbread AI for research.

提供机构：

stephantulkens

5,000+

优质数据集

54 个

任务类型

进入经典数据集