five

kevin017/tokenized_bioS_inverse_QA_large_padding

收藏
Hugging Face2025-04-03 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/kevin017/tokenized_bioS_inverse_QA_large_padding
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集包含三个主要特征字段:input_ids、attention_mask和answers_tokenized。input_ids和attention_mask是整数序列,用于表示输入数据的索引和注意力掩码。answers_tokenized字段包含了注意力掩码和索引序列,似乎是针对答案进行标记化的信息。数据集分为训练集和测试集,分别包含192886和192889个示例。整个数据集的大小超过1GB,下载后解压的大小约为1.5MB。由于README没有提供详细的描述,我们无法确定数据集的具体内容和用途。

The dataset includes three main feature fields: input_ids, attention_mask, and answers_tokenized. input_ids and attention_mask are integer sequences representing the index of input data and attention masks, respectively. The answers_tokenized field contains attention masks and index sequences, which seems to be tokenized information for answers. The dataset is split into a training set and a test set, containing 192886 and 192889 examples respectively. The entire dataset size is over 1GB, with a download size of approximately 1.5MB. As the README does not provide a detailed description, the specific content and purpose of the dataset cannot be determined.
提供机构:
kevin017
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作