kevin017/tokenized_bioS_inverse_QA_large_padding

Name: kevin017/tokenized_bioS_inverse_QA_large_padding
Creator: kevin017
Published: 2025-04-03 14:40:19
License: 暂无描述

Hugging Face2025-04-03 更新2025-04-12 收录

下载链接：

https://hf-mirror.com/datasets/kevin017/tokenized_bioS_inverse_QA_large_padding

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含三个主要特征字段：input_ids、attention_mask和answers_tokenized。input_ids和attention_mask是整数序列，用于表示输入数据的索引和注意力掩码。answers_tokenized字段包含了注意力掩码和索引序列，似乎是针对答案进行标记化的信息。数据集分为训练集和测试集，分别包含192886和192889个示例。整个数据集的大小超过1GB，下载后解压的大小约为1.5MB。由于README没有提供详细的描述，我们无法确定数据集的具体内容和用途。

The dataset includes three main feature fields: input_ids, attention_mask, and answers_tokenized. input_ids and attention_mask are integer sequences representing the index of input data and attention masks, respectively. The answers_tokenized field contains attention masks and index sequences, which seems to be tokenized information for answers. The dataset is split into a training set and a test set, containing 192886 and 192889 examples respectively. The entire dataset size is over 1GB, with a download size of approximately 1.5MB. As the README does not provide a detailed description, the specific content and purpose of the dataset cannot be determined.

提供机构：

kevin017

5,000+

优质数据集

54 个

任务类型

进入经典数据集