ChavyvAkvar/fineweb-2-1M-Sample-Thai
收藏Hugging Face2025-09-27 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/ChavyvAkvar/fineweb-2-1M-Sample-Thai
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含文本内容和其他相关元数据字段,如唯一标识符、URL、日期、文件路径等。数据集被划分为训练集,大小超过8GB,包含100万个示例。数据集支持多种语言,并提供了语言识别的相关分数。默认配置指定了训练集的数据文件路径。
The dataset includes text content and other metadata fields such as unique identifiers, URLs, dates, file paths, etc. The dataset is split into a training set, which is over 8GB in size and contains 1 million examples. The dataset supports multiple languages and provides scores related to language recognition. The default configuration specifies the data file path for the training set.
提供机构:
ChavyvAkvar



