nhagar/obelics_urls
收藏Hugging Face2025-05-15 更新2025-08-30 收录
下载链接:
https://hf-mirror.com/datasets/nhagar/obelics_urls
下载链接
链接失效反馈官方服务:
资源简介:
该数据集提供了与HuggingFaceM4/OBELICS训练记录相关的URL和顶级域名。它是一组用于探索LLM训练数据集的数据集的一部分。该数据集通过从源数据中提取URL和顶级域名创建而成,使研究人员能够分析这些数据集的内容,而无需管理大量的原始文本。数据集的结构包括两列:url和domain。
This dataset provides the URLs and top-level domains associated with training records in the HuggingFaceM4/OBELICS dataset. It is part of a collection of datasets curated for exploring LLM training datasets. The dataset was created by extracting URLs and top-level domains from the source data, enabling researchers to analyze the contents of these datasets without managing large volumes of raw text. The dataset structure includes two columns: url and domain.
提供机构:
nhagar



