AbdallahhSaleh/asd-3-tokenized-padded
收藏Hugging Face2025-03-18 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/AbdallahhSaleh/asd-3-tokenized-padded
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了三个主要特征:input_ids、token_type_ids和attention_mask,均为序列数据,分别采用int32和int8类型存储。数据集分为训练集,共有约13798104个示例,总字节数为10762521120字节。提供了默认配置,并指定了训练数据文件的路径。具体应用场景和数据集内容描述未在README中提及。
The dataset includes three main features: input_ids, token_type_ids, and attention_mask, all of which are sequence data stored in int32 and int8 types. The dataset is split into a training set with a total of approximately 13,798,104 examples, with a total of 10,762,521,120 bytes. A default configuration is provided, and the path to the training data files is specified. The specific application scenario and content description of the dataset are not mentioned in the README.
提供机构:
AbdallahhSaleh



