illuin-conteb/narrative-qa

Name: illuin-conteb/narrative-qa
Creator: illuin-conteb
Published: 2025-06-02 11:12:30
License: 暂无描述

Hugging Face2025-06-02 更新2025-10-18 收录

下载链接：

https://hf-mirror.com/datasets/illuin-conteb/narrative-qa

下载链接

链接失效反馈

官方服务：

资源简介：

ConTEB NarrativeQA数据集是ConTEB（上下文感知文本嵌入基准）的一部分，设计用于评估上下文嵌入模型的能力。它基于广泛使用的NarrativeQA数据集，包含长文档以及与之相关的现有问答对集。数据集由原始文档集合构建而成，文本被提取并分块处理，这些块并不总是自包含的，需要文档级的上下文来构建有意义的表示。数据集提供了8575个查询、355个文档和1750个块，每个块平均包含约152个标记。数据集结构包括documents和queries两部分，documents包含块信息，queries包含查询、答案和相关的块ID。

The ConTEB NarrativeQA dataset is part of the ConTEB (Context-aware Text Embedding Benchmark), designed to evaluate the capabilities of contextual embedding models. It is based on the widely used NarrativeQA dataset and consists of long documents along with associated sets of question-answer pairs. The dataset is constructed from a collection of original documents, with text extracted and chunked, these chunks are not always self-contained and require document-level context to build meaningful representations. The dataset provides 8575 queries, 355 documents, and 1750 chunks, with an average of about 152 tokens per chunk. The dataset structure includes two parts: documents and queries, where documents contain chunk information and queries contain the query text, answer, and the related chunk ID.

提供机构：

illuin-conteb

5,000+

优质数据集

54 个

任务类型

进入经典数据集