alea-institute/kl3m-data-dotgov-www.usbr.gov
收藏Hugging Face2025-02-02 更新2025-02-15 收录
下载链接:
https://hf-mirror.com/datasets/alea-institute/kl3m-data-dotgov-www.usbr.gov
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含identifier、dataset、mime_type和tokens等字段,其中tokens字段为int64类型的序列。数据集分为训练集,共有90114个示例,文件大小为16962166147字节。数据集的下载大小为1639816456字节,总大小为16962166147字节。默认配置下,训练数据文件的路径为data/train-*。
The dataset includes fields such as identifier, dataset, mime_type, and tokens, with the tokens field being a sequence of type int64. The dataset is split into a training set, which contains 90,114 examples and has a file size of 16,962,166,147 bytes. The download size of the dataset is 1,639,816,456 bytes, and the total size is 16,962,166,147 bytes. Under the default configuration, the path to the training data files is data/train-*.
提供机构:
alea-institute



