five

orionweller/dolma_20_percent_sample

收藏
Hugging Face2024-05-23 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/orionweller/dolma_20_percent_sample
下载链接
链接失效反馈
官方服务:
资源简介:
# Example Usage ``` from datasets import load_dataset import huggingface_hub for folder_name in huggingface_hub.list_repo_tree("orionweller/dolma_20_percent_sample", repo_type="dataset"): if folder_name in ["README.md", ".gitattributes"]: continue # otherwise is a url from a particular part of Dolma, e.g. `algebraic_stack_train_0000`, total is 2419 # You can load only one part like this dataset = load_dataset("orionweller/dolma_20_percent_sample", data_files={"data": f"{folder_name.path}/*"})["data"] # dataset will have these keys: ["id", "text", "added", "created", "source"] ```

This dataset is a sample named dolma_20_percent_sample, consisting of multiple parts, each corresponding to a specific URL. The keys in the dataset include [id, text, added, created, source].
提供机构:
orionweller
原始信息汇总

数据集概述

数据集名称

  • 名称: dolma_20_percent_sample
  • 所有者: orionweller

数据集结构

  • 数据文件: 数据集由多个部分组成,每个部分对应一个文件夹,例如algebraic_stack_train_0000
  • 数据键: 加载的数据集包含以下键: ["id", "text", "added", "created", "source"]

数据集加载

  • 加载方法: 使用load_dataset函数从Hugging Face Hub加载数据集的特定部分。

  • 示例代码: python from datasets import load_dataset import huggingface_hub

    dataset = load_dataset("orionweller/dolma_20_percent_sample", data_files={"data": f"{folder_name.path}/*"})["data"]

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作