theojiang/CIVETv2_key_idea_retrieval_dataset_v3.2_gtebase_msmarco

Name: theojiang/CIVETv2_key_idea_retrieval_dataset_v3.2_gtebase_msmarco
Creator: theojiang
Published: 2024-12-03 22:48:01
License: 暂无描述

Hugging Face2024-12-03 更新2024-12-14 收录

下载链接：

https://hf-mirror.com/datasets/theojiang/CIVETv2_key_idea_retrieval_dataset_v3.2_gtebase_msmarco

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含用于自然语言处理任务的特征数据，主要包括输入ID序列、注意力掩码序列和问题嵌入序列。数据集分为训练集和验证集，训练集包含55491个样本，验证集包含500个样本。数据集的下载大小为222612190字节，总大小为293788706.9510966字节。

This dataset contains feature data for natural language processing tasks, including input ID sequences, attention mask sequences, and question embedding sequences. The dataset is divided into a training set and a validation set, with the training set containing 55,491 samples and the validation set containing 500 samples. The download size of the dataset is 222,612,190 bytes, and the total size is 293,788,706.9510966 bytes.

提供机构：

theojiang

5,000+

优质数据集

54 个

任务类型

进入经典数据集