systemk/culturax-10M
收藏Hugging Face2024-11-28 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/systemk/culturax-10M
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含多种语言(如阿拉伯语、孟加拉语、德语、英语、西班牙语、法语、印地语、印尼语、日语、马拉地语、葡萄牙语、俄语、斯瓦希里语、乌尔都语、中文)的文本数据。每个语言的数据集包含四个特征:text(文本内容)、timestamp(时间戳)、url(来源URL)、source(来源)。数据集仅包含训练集(train),并提供了每个语言数据集的大小、下载大小和示例数量。
This dataset contains text data in multiple languages (such as Arabic, Bengali, German, English, Spanish, French, Hindi, Indonesian, Japanese, Marathi, Portuguese, Russian, Swahili, Urdu, Chinese). Each languages dataset includes four features: text (text content), timestamp (timestamp), url (source URL), and source (source). The dataset only contains a training set (train) and provides the size, download size, and number of examples for each languages dataset.
提供机构:
systemk



