CHINAYWX/g_tokenizered
收藏Hugging Face2024-09-25 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/CHINAYWX/g_tokenizered
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含三个主要特征:input_ids(序列类型,int32)、attention_mask(序列类型,int8)和labels(序列类型,int64)。数据集被分为训练集和测试集,其中训练集包含1,000,000个示例,测试集包含10,000个示例。数据集的下载大小为145,505,796字节,总大小为499,171,017字节。
The dataset includes three main features: input_ids (sequence type, int32), attention_mask (sequence type, int8), and labels (sequence type, int64). The dataset is divided into a training set and a test set, with the training set containing 1,000,000 examples and the test set containing 10,000 examples. The download size of the dataset is 145,505,796 bytes, and the total size is 499,171,017 bytes.
提供机构:
CHINAYWX



