tomaarsen/NanoQuoraRetrieval-bm25

Name: tomaarsen/NanoQuoraRetrieval-bm25
Creator: tomaarsen
Published: 2025-02-03 17:34:06
License: 暂无描述

Hugging Face2025-02-03 更新2025-02-15 收录

下载链接：

https://hf-mirror.com/datasets/tomaarsen/NanoQuoraRetrieval-bm25

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含三个部分：文本数据（corpus）、查询数据（queries）和相关性信息（relevance）。文本数据包含唯一的标识符和文本内容，查询数据同样包含标识符和文本内容，而相关性信息包含查询的标识符、与之正相关的文本标识符序列以及基于BM25算法排序的文本标识符序列。每个部分都提供了训练集分割，可用于训练相关模型。

The dataset consists of three parts: text data (corpus), query data (queries), and relevance information (relevance). The text data includes a unique identifier and text content, the query data also includes an identifier and text content, and the relevance information includes the query identifier, a sequence of identifiers of positively correlated texts, and a sequence of text identifiers sorted based on the BM25 algorithm. Each part provides a training set split for use in training relevant models.

提供机构：

tomaarsen

5,000+

优质数据集

54 个

任务类型

进入经典数据集