alea-institute/kl3m-data-dotgov-www.usgs.gov
收藏Hugging Face2025-02-02 更新2025-02-15 收录
下载链接:
https://hf-mirror.com/datasets/alea-institute/kl3m-data-dotgov-www.usgs.gov
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含四个特征:标识符(identifier),数据集名称(dataset),MIME类型(mime_type),以及token的数量(tokens)。数据集被划分为训练集(train),包含73769个样本,总字节数为578,667,091字节。数据集的下载大小为116,869,990字节。
The dataset includes four features: identifier, dataset name, MIME type, and the number of tokens. The dataset is split into a training set (train) with 73,769 examples, totaling 578,667,091 bytes. The download size of the dataset is 116,869,990 bytes.
提供机构:
alea-institute



