airtrain-ai/fineweb-edu-fortified
收藏Hugging Face2024-08-08 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/airtrain-ai/fineweb-edu-fortified
下载链接
链接失效反馈官方服务:
资源简介:
该数据集主要面向文本生成任务,包含多个配置,每个配置都有详细的特征描述,如文本、ID、dump、URL、文件路径、语言、语言评分、词数、评分、整数评分、嵌入和计数等。数据集被划分为训练集,每个训练集都有指定的字节数和示例数。数据集的许可证为ODC-BY。
This dataset is primarily focused on text generation tasks, containing multiple configurations with detailed feature descriptions such as text, id, dump, url, file_path, language, language_score, token_count, score, int_score, embedding, and count. The dataset is split into training sets with specified numbers of bytes and examples. The dataset is licensed under ODC-BY.
提供机构:
airtrain-ai



