EleutherAI/filtering-pretraining-mix_20250516-0250
收藏Hugging Face2025-05-22 更新2025-05-31 收录
下载链接:
https://hf-mirror.com/datasets/EleutherAI/filtering-pretraining-mix_20250516-0250
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了标识符(id)、单词过滤标志(word_filter)、单词过滤元数据(word_filter_metadata)、BERT过滤标志(bert_filter)、BERT过滤元数据(bert_filter_metadata)以及组合过滤标志(combined_filter)等字段。数据集分为训练集(train),提供了相应的字节大小和示例数量。此外,还包括了数据集的下载大小和总大小等信息。配置部分包含了默认配置及训练数据的文件路径。
The dataset includes fields such as id, word_filter, word_filter_metadata, bert_filter, bert_filter_metadata, and combined_filter. The dataset is split into a training set (train) with provided byte size and example count. It also includes information on the datasets download size and total size, as well as configuration details including the default configuration and the file paths for the training data.
提供机构:
EleutherAI



