mlx-community/recycling_the_web-100K
收藏Hugging Face2025-09-04 更新2025-09-13 收录
下载链接:
https://hf-mirror.com/datasets/mlx-community/recycling_the_web-100K
下载链接
链接失效反馈官方服务:
资源简介:
Recycling the Web数据集是一个由Thao Nguyen策划的英语文本数据集,主要用于语言模型的预训练数据质量和数量的增强。它是为MLX社区准备的facebook/recycling_the_web数据集的子集,提供了不同大小的版本,包括1k、100k、200k、400k和1m。数据集遵循CC-by-NC许可。
Recycling the Web dataset is an English text dataset curated by Thao Nguyen, primarily used for enhancing the quality and quantity of pre-training data for language models. It is a subset of the facebook/recycling_the_web dataset prepared for the MLX community, offering versions in different sizes including 1k, 100k, 200k, 400k, and 1m. The dataset is licensed under CC-by-NC.
提供机构:
mlx-community



