Queeey/miriad-5.8M-raw-passages-tokenized-all-MiniLM-L6-v2
收藏Hugging Face2024-12-05 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/Queeey/miriad-5.8M-raw-passages-tokenized-all-MiniLM-L6-v2
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含用于训练的特征,主要包括input_ids(输入ID序列)、attention_mask(注意力掩码序列)和overflow_to_sample_mapping(溢出到样本映射)。数据集被分割为训练集,包含16354797个样本,总大小为42129957072字节。
This dataset contains features for training, mainly including input_ids (input ID sequences), attention_mask (attention mask sequences), and overflow_to_sample_mapping (overflow to sample mapping). The dataset is split into a training set, containing 16,354,797 samples, with a total size of 42,129,957,072 bytes.
提供机构:
Queeey



