TokenHaven/FineWeb-Edu-Spanish
收藏Hugging Face2025-07-30 更新2025-11-30 收录
下载链接:
https://hf-mirror.com/datasets/TokenHaven/FineWeb-Edu-Spanish
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含高质量的西班牙语文本数据,这些数据是从英文翻译而来,使用了Qwen3-235B-A22B大型语言模型进行翻译。数据已经通过FineWeb-Edu分类器和WebOrganizer分类器进行了过滤和分类,确保了其教育质量和格式。数据集包括元数据,如URL、日期和语言评分,并提供多种内容格式和主题。数据已经使用MinHash算法进行了去重。
This dataset contains high-quality Spanish text data, translated from English using the Qwen3-235B-A22B LLM model. The data has been filtered and classified using the FineWeb-Edu classifier and WebOrganizer Classifiers to ensure educational quality and format. The dataset includes metadata such as URLs, dates, and language scores, and is available in various content formats and topics. The data has been deduplicated using the MinHash algorithm.
提供机构:
TokenHaven



