orionweller/dolma_20_percent_sample

Name: orionweller/dolma_20_percent_sample
Creator: orionweller
Published: 2024-05-23 20:23:45
License: 暂无描述

Hugging Face2024-05-23 更新2024-06-12 收录

下载链接：

https://hf-mirror.com/datasets/orionweller/dolma_20_percent_sample

下载链接

链接失效反馈

官方服务：

资源简介：

# Example Usage ``` from datasets import load_dataset import huggingface_hub for folder_name in huggingface_hub.list_repo_tree("orionweller/dolma_20_percent_sample", repo_type="dataset"): if folder_name in ["README.md", ".gitattributes"]: continue # otherwise is a url from a particular part of Dolma, e.g. `algebraic_stack_train_0000`, total is 2419 # You can load only one part like this dataset = load_dataset("orionweller/dolma_20_percent_sample", data_files={"data": f"{folder_name.path}/*"})["data"] # dataset will have these keys: ["id", "text", "added", "created", "source"] ```

This dataset is a sample named dolma_20_percent_sample, consisting of multiple parts, each corresponding to a specific URL. The keys in the dataset include [id, text, added, created, source].

提供机构：

orionweller

原始信息汇总

数据集概述

数据集名称

名称: dolma_20_percent_sample
所有者: orionweller

数据集结构

数据文件: 数据集由多个部分组成，每个部分对应一个文件夹，例如algebraic_stack_train_0000。
数据键: 加载的数据集包含以下键: ["id", "text", "added", "created", "source"]

数据集加载

加载方法: 使用load_dataset函数从Hugging Face Hub加载数据集的特定部分。
示例代码: python from datasets import load_dataset import huggingface_hub

dataset = load_dataset("orionweller/dolma_20_percent_sample", data_files={"data": f"{folder_name.path}/*"})["data"]

5,000+

优质数据集

54 个

任务类型

进入经典数据集