kanwal-mehreen18/c4-splits
收藏Hugging Face2025-11-11 更新2025-11-15 收录
下载链接:
https://hf-mirror.com/datasets/kanwal-mehreen18/c4-splits
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了多种语言的文本数据,每种语言都有训练集和测试集,适用于机器学习模型的训练和评估。数据集中的特征包括文本内容、时间戳和URL。总数据大小超过3.9GB,示例数量达到260万。
The dataset consists of multilingual text data, with training and test sets for each language, suitable for training and evaluation of machine learning models. The features in the dataset include text content, timestamp, and URL. The total dataset size exceeds 3.9GB with over 2.6 million examples.
提供机构:
kanwal-mehreen18



