PursuitOfDataScience/processed-fineweb-edu
收藏Hugging Face2025-02-27 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/PursuitOfDataScience/processed-fineweb-edu
下载链接
链接失效反馈官方服务:
资源简介:
这是一个经过处理的FineWeb-Edu数据集,适用于语言模型训练和自然语言处理研究。数据集已经被分词并截断至指定的块大小(例如2048),以备用于模型的预训练或基于转换器的语言模型的评估。
This dataset is a processed version of the FineWeb-Edu dataset, intended for language model training and NLP research. It has been tokenized and truncated to a specified block size (i.e., 2048), preparing it for model pre-training or evaluation with transformer-based language models.
提供机构:
PursuitOfDataScience



