stephantulkens/mdlr-query-gte-modernbert-pooled

Name: stephantulkens/mdlr-query-gte-modernbert-pooled
Creator: stephantulkens
Published: 2025-10-29 19:55:04
License: 暂无描述

Hugging Face2025-10-29 更新2025-11-15 收录

下载链接：

https://hf-mirror.com/datasets/stephantulkens/mdlr-query-gte-modernbert-pooled

下载链接

链接失效反馈

官方服务：

资源简介：

这是一个名为sentence-transformers/mldr的数据集，使用Alibaba-NLP/gte-modernbert-base模型进行嵌入。数据集包含10,000个示例，分为train部分。每个示例包括一个text字段和一个embedding字段，其中embedding是文本的768维向量表示。该数据集适用于大规模蒸馏、检索和相似性搜索等任务。建议在预处理时将文本截断到模型的最大标记长度。如果需要，可以安全地截断嵌入以减小嵌入尺寸。数据集是由Hugging Face Hub中的Alibaba-NLP/gte-modernbert-base模型生成的。

This dataset named sentence-transformers/mldr has been embedded using the Alibaba-NLP/gte-modernbert-base model. It contains 10,000 examples in the train split. Each example includes a text field and an embedding field, with the embedding being a 768-dimensional vector representation of the text. The dataset is intended for tasks such as large-scale distillation, retrieval, and similarity search. It is recommended to truncate the text to the models maximum token length during preprocessing. The embeddings can be safely sliced to reduce the embedding size if needed. The dataset was produced with the Alibaba-NLP/gte-modernbert-base model from Hugging Face Hub.

提供机构：

stephantulkens

5,000+

优质数据集

54 个

任务类型

进入经典数据集