zerostratos/cc2024
收藏Hugging Face2025-03-27 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/zerostratos/cc2024
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了文本、标识符、URL、日期、文件路径等信息字段,并具有语言检测相关的特征,如语言类型、语言分数和语言脚本。此外,还包含了文本的聚类信息。数据集被分为训练集,其中包含了近300万条示例。数据集的总大小约为17GB。
The dataset includes fields for text, identifier, URL, date, file path, and features related to language detection such as language type, language score, and language script. Additionally, it contains clustering information for the texts. The dataset is split into a training set, which comprises nearly 3 million examples. The total size of the dataset is approximately 17GB.
提供机构:
zerostratos



