Juliushanhanhan/pile-100k-llama3-tokenized-cxt-1024
收藏Hugging Face2024-07-07 更新2024-07-22 收录
下载链接:
https://hf-mirror.com/datasets/Juliushanhanhan/pile-100k-llama3-tokenized-cxt-1024
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含100,000个训练样本,每个样本由int64类型的tokens序列组成。训练集的总大小为812,993,216字节,下载大小为198,145,986字节。数据集的配置文件指定了训练数据文件的路径为data/train-*。
The dataset contains 100,000 training samples, each consisting of a sequence of tokens of type int64. The total size of the training set is 812,993,216 bytes, with a download size of 198,145,986 bytes. The configuration file of the dataset specifies the path to the training data files as data/train-*.
提供机构:
Juliushanhanhan



