reflex-ai/fineweb-ultra-mini-pro
收藏Hugging Face2024-12-04 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/reflex-ai/fineweb-ultra-mini-pro
下载链接
链接失效反馈官方服务:
资源简介:
Fineweb Ultra Mini数据集是从Hugging Face的FineWeb数据集中提取的高质量数据,涵盖了原始数据集中1-0.5%的高质量文档。数据集通过训练文本分类模型筛选出前2-3%的文档,并经过处理以确保一致性和质量。该数据集适用于语言建模、文本摘要、问答和文本分类等多种自然语言处理任务。
Fineweb Ultra Mini is a dataset derived from the original Fineweb dataset made by huggingface, focusing on extracting high quality data from the 1-0.5% range of the original dataset. The dataset was curated by training a text classification model to identify the top 2-3% of documents, which were then processed to ensure consistency and quality. It is intended for a variety of natural language processing tasks, including language modeling, text summarization, question answering, and text classification.
提供机构:
reflex-ai



