YnezT/gpt2-directions-random_walk_fix_len-10M-neighbour-2
收藏Hugging Face2024-09-08 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/YnezT/gpt2-directions-random_walk_fix_len-10M-neighbour-2
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含三个主要特征:input_ids(序列类型,int32)、attention_mask(序列类型,int8)和labels(序列类型,int64)。数据集分为训练集和测试集,训练集包含10,000,000个样本,测试集包含200,000个样本。训练集的大小为17,150,000,000字节,测试集的大小为343,000,000字节。整个数据集的下载大小为1,948,560,536字节,数据集总大小为17,493,000,000字节。这些特征和数据类型表明该数据集可能用于自然语言处理(NLP)任务,如文本分类或序列标注。
The dataset includes three main features: input_ids (sequence type, int32), attention_mask (sequence type, int8), and labels (sequence type, int64). It is divided into a training set and a test set, with the training set containing 10,000,000 samples and the test set containing 200,000 samples. The training set size is 17,150,000,000 bytes, and the test set size is 343,000,000 bytes. The total download size of the dataset is 1,948,560,536 bytes, and the total dataset size is 17,493,000,000 bytes. These features and data types suggest that the dataset is likely used for natural language processing (NLP) tasks, such as text classification or sequence labeling.
提供机构:
YnezT



