AbdallahhSaleh/asd-8-tokenized-padded
收藏Hugging Face2025-03-19 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/AbdallahhSaleh/asd-8-tokenized-padded
下载链接
链接失效反馈官方服务:
资源简介:
该数据集是一个包含输入ID、token类型ID和注意力掩码的训练集,总共有约13926544个样本,数据集大小为约10GB。数据集适用于NLP任务,如文本分类、命名实体识别等,其中input_ids表示单词或子词的索引,token_type_ids用于区分句子的不同部分,attention_mask则用于指示有效输入的结束位置。
The dataset is a training set containing input IDs, token type IDs, and attention masks, with a total of about 13,926,544 samples and a dataset size of about 10GB. It is suitable for NLP tasks such as text classification, named entity recognition, etc., where input_ids represent the indices of words or subwords, token_type_ids are used to distinguish different parts of sentences, and attention_mask indicates the end position of valid inputs.
提供机构:
AbdallahhSaleh



