quintic/c4_codeparrot_3m_3m_mix_9b_tokenized_llama3
收藏Hugging Face2024-08-05 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/quintic/c4_codeparrot_3m_3m_mix_9b_tokenized_llama3
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含一个名为train的分割,共有4,703,401个样本,总大小为38,586,701,804字节。数据集的特征包括input_ids和overflow_to_sample_mapping,分别用于存储序列数据和样本映射信息。数据集的下载大小为13,461,970,268字节。
The dataset includes a split named train with 4,703,401 samples and a total size of 38,586,701,804 bytes. The features of the dataset include input_ids and overflow_to_sample_mapping, which are used to store sequence data and sample mapping information, respectively. The download size of the dataset is 13,461,970,268 bytes.
提供机构:
quintic



