kaitchup/fineweb-2-ben_Beng-sample-10k
收藏Hugging Face2025-06-26 更新2025-07-05 收录
下载链接:
https://hf-mirror.com/datasets/kaitchup/fineweb-2-ben_Beng-sample-10k
下载链接
链接失效反馈官方服务:
资源简介:
这是一个包含文本数据的训练集,其中每个文本数据都有唯一的标识符,并且包含了URL、日期、文件路径等信息。数据集中的文本还附带了关于语言的信息,如语言类型、语言分数和语言脚本。数据集的大小为72464512字节,共有10000个训练示例。
This is a training dataset containing text data, each with a unique identifier and additional information such as URL, date, and file path. The text in the dataset is also accompanied by information about the language, including language type, language score, and language script. The dataset is 72464512 bytes in size and consists of 10000 training examples.
提供机构:
kaitchup



