five

kevin017/tokenized_bioS_inverse_QA_three_large_padding

收藏
Hugging Face2025-04-03 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/kevin017/tokenized_bioS_inverse_QA_three_large_padding
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集包含了三个特征字段:input_ids、attention_mask和answers_tokenized。input_ids和attention_mask是整数序列,分别使用int32和int8数据类型。answers_tokenized是一个结构化的字段,它本身包含两个序列:attention_mask和input_ids,这两个序列都使用int64数据类型。数据集分为训练集和测试集,每个集合都包含34789个样本。数据集的总下载大小为2.49MB,完整大小为约174MB。提供了默认配置,指定了训练和测试数据文件的路径。

The dataset includes three feature fields: input_ids, attention_mask, and answers_tokenized. input_ids and attention_mask are integer sequences using int32 and int8 data types, respectively. answers_tokenized is a structured field that contains two sequences: attention_mask and input_ids, both using int64 data types. The dataset is split into a training set and a test set, each containing 34,789 samples. The total download size of the dataset is 2.49MB, and the full size is approximately 174MB. A default configuration is provided, specifying the paths to the training and test data files.
提供机构:
kevin017
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作