AbdallahhSaleh/asd-0-tokenized
收藏Hugging Face2025-03-07 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/AbdallahhSaleh/asd-0-tokenized
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个未具体描述的机器学习数据集,包含输入ID序列(input_ids)、标记类型ID序列(token_type_ids)和注意力掩码序列(attention_mask)。这些特征是常见于自然语言处理任务中的输入格式,如用于预训练语言模型或文本分类等。数据集分为训练集,共有约1276万条示例,大小为4.8GB。
This dataset is an unspecified machine learning dataset containing sequences of input IDs (input_ids), token type IDs (token_type_ids), and attention masks (attention_mask). These features are commonly used in natural language processing tasks, such as pre-training language models or text classification. The dataset is split into a training set with approximately 12.76 million examples, totaling 4.8GB in size.
提供机构:
AbdallahhSaleh



