data-is-better-together/fineweb2-2k-samples
收藏Hugging Face2025-03-26 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/data-is-better-together/fineweb2-2k-samples
下载链接
链接失效反馈官方服务:
资源简介:
这是一个包含多种语言文本数据的数据集,每个文本数据都包含ID、URL、日期、文件路径等信息。数据集分为多个配置,每个配置都有对应的训练集,且训练集的大小和示例数不同。数据集支持多种语言,包括拉丁语和阿拉伯语等。
This is a dataset containing text data in various languages, each with information such as ID, URL, date, file path, etc. The dataset is divided into multiple configurations, each with its own training set, and the size and number of examples in the training sets vary. The dataset supports multiple languages, including Latin and Arabic.
提供机构:
data-is-better-together



