theojiang/CIVETv2_key_idea_retrieval_dataset_v3.1_gtebase_msmarco
收藏Hugging Face2024-11-21 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/theojiang/CIVETv2_key_idea_retrieval_dataset_v3.1_gtebase_msmarco
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含用于自然语言处理任务的文本数据,主要特征包括passage_input_ids、passage_attention_mask和question_embeddings。passage_input_ids和passage_attention_mask是序列类型,分别存储为int64和float32,而question_embeddings是一个嵌套序列,存储为float32。数据集分为训练集和验证集,训练集包含507990个样本,验证集包含500个样本。文件大小和下载大小也有详细说明。
This dataset contains text data for natural language processing tasks, with main features including passage_input_ids, passage_attention_mask, and question_embeddings. passage_input_ids and passage_attention_mask are sequence types, stored as int64 and float32 respectively, while question_embeddings is a nested sequence, stored as float32. The dataset is divided into a training set and a validation set, with the training set containing 507,990 samples and the validation set containing 500 samples. File sizes and download sizes are also detailed.
提供机构:
theojiang



