pszemraj/fineweb-CC-MAIN-2024-10-insurance-700k-dedup-minified
收藏Hugging Face2025-01-18 更新2025-02-15 收录
下载链接:
https://hf-mirror.com/datasets/pszemraj/fineweb-CC-MAIN-2024-10-insurance-700k-dedup-minified
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含两个配置:modernbert-embed-base和rover_nexus,每个配置都包含文本、URL、token_count和cluster_id字段。数据集专为文本生成任务设计,包含训练集,适用于机器学习模型的训练。数据集遵循odc-by许可证。
The dataset consists of two configurations: modernbert-embed-base and rover_nexus, each containing fields for text, URL, token_count, and cluster_id. Designed for text generation tasks, the dataset includes a training set suitable for machine learning model training and is licensed under odc-by.
提供机构:
pszemraj



