five

illuin-conteb/mldr-conteb-eval

收藏
Hugging Face2025-06-02 更新2025-10-18 收录
下载链接:
https://hf-mirror.com/datasets/illuin-conteb/mldr-conteb-eval
下载链接
链接失效反馈
官方服务:
资源简介:
ConTEB - MLDR数据集是上下文感知文本嵌入基准(ConTEB)的一部分,设计用于评估上下文嵌入模型的性能。该数据集基于广泛使用的MLDR数据集构建,包含了长文档以及与之相关的现有问题和答案对。数据集从预先存在的文档集合中提取文本,并将其分块处理,这些块并不总是自包含的,需要整个文档的上下文来构建有意义的表示。数据集提供了一个专注于上下文嵌入的基准,包含了原始文档、从文档派生的块和查询。

The ConTEB - MLDR dataset is a part of the Context-aware Text Embedding Benchmark (ConTEB) designed to evaluate the capabilities of contextual embedding models. It is built upon the widely used MLDR dataset and consists of long documents along with existing sets of question-answer pairs. The dataset extracts text from pre-existing collections of documents and chunks them, which are not always self-contained and require the context of the entire document to build meaningful representations. The dataset provides a focused benchmark for contextualized embeddings, including original documents, chunks derived from them, and queries.
提供机构:
illuin-conteb
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作