five

reflex-ai/fineweb-ultra-mini-pro

收藏
Hugging Face2024-12-04 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/reflex-ai/fineweb-ultra-mini-pro
下载链接
链接失效反馈
官方服务:
资源简介:
Fineweb Ultra Mini数据集是从Hugging Face的FineWeb数据集中提取的高质量数据,涵盖了原始数据集中1-0.5%的高质量文档。数据集通过训练文本分类模型筛选出前2-3%的文档,并经过处理以确保一致性和质量。该数据集适用于语言建模、文本摘要、问答和文本分类等多种自然语言处理任务。

Fineweb Ultra Mini is a dataset derived from the original Fineweb dataset made by huggingface, focusing on extracting high quality data from the 1-0.5% range of the original dataset. The dataset was curated by training a text classification model to identify the top 2-3% of documents, which were then processed to ensure consistency and quality. It is intended for a variety of natural language processing tasks, including language modeling, text summarization, question answering, and text classification.
提供机构:
reflex-ai
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作