konwoo/dclm-200k-docs
收藏Hugging Face2025-11-10 更新2025-11-15 收录
下载链接:
https://hf-mirror.com/datasets/konwoo/dclm-200k-docs
下载链接
链接失效反馈官方服务:
资源简介:
这是一个包含文本数据的训练集,其中包含了如bff_contained_ngram_count_before_dedupe、language_id_whole_page_fasttext等特征字段,以及文本内容text和URL等信息。数据集的元数据metadata中包含了记录的详细信息,如时间戳、IP地址等。整个数据集被划分为训练集,共有200000个示例。
This is a training dataset containing text data, which includes feature fields such as bff_contained_ngram_count_before_dedupe, language_id_whole_page_fasttext, and metadata containing details like timestamps and IP addresses. The dataset is split into a training set with a total of 200,000 examples.
提供机构:
konwoo



