chenhunghan/cosmopedia-kubernetes
收藏Hugging Face2024-03-12 更新2024-06-11 收录
下载链接:
https://hf-mirror.com/datasets/chenhunghan/cosmopedia-kubernetes
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: auto_math_text
features:
- name: text
dtype: string
- name: format
dtype: string
- name: audience
dtype: string
splits:
- name: train
num_bytes: 8478.400880047388
num_examples: 3
download_size: 16747
dataset_size: 8478.400880047388
- config_name: khanacademy
features:
- name: text
dtype: string
- name: format
dtype: string
- name: audience
dtype: string
splits:
- name: train
num_bytes: 0
num_examples: 0
download_size: 932
dataset_size: 0
- config_name: openstax
features:
- name: text
dtype: string
- name: format
dtype: string
- name: audience
dtype: string
splits:
- name: train
num_bytes: 11689.100077573377
num_examples: 3
download_size: 26265
dataset_size: 11689.100077573377
- config_name: stanford
features:
- name: text
dtype: string
- name: format
dtype: string
- name: audience
dtype: string
splits:
- name: train
num_bytes: 1355287.8612738524
num_examples: 272
download_size: 1060451
dataset_size: 1355287.8612738524
- config_name: stories
features:
- name: text
dtype: string
- name: format
dtype: string
- name: audience
dtype: string
splits:
- name: train
num_bytes: 2934322.4468095503
num_examples: 1064
download_size: 1946246
dataset_size: 2934322.4468095503
- config_name: web_samples_v1
features:
- name: text
dtype: string
- name: format
dtype: string
- name: audience
dtype: string
splits:
- name: train
num_bytes: 63360415.08870015
num_examples: 15691
download_size: 45350020
dataset_size: 63360415.08870015
- config_name: web_samples_v2
features:
- name: text
dtype: string
- name: format
dtype: string
- name: audience
dtype: string
splits:
- name: train
num_bytes: 57739423.97337159
num_examples: 14318
download_size: 40353548
dataset_size: 57739423.97337159
- config_name: wikihow
features:
- name: text
dtype: string
- name: format
dtype: string
- name: audience
dtype: string
splits:
- name: train
num_bytes: 59943.793823350505
num_examples: 13
download_size: 50577
dataset_size: 59943.793823350505
configs:
- config_name: auto_math_text
data_files:
- split: train
path: auto_math_text/train-*
- config_name: khanacademy
data_files:
- split: train
path: khanacademy/train-*
- config_name: openstax
data_files:
- split: train
path: openstax/train-*
- config_name: stanford
data_files:
- split: train
path: stanford/train-*
- config_name: stories
data_files:
- split: train
path: stories/train-*
- config_name: web_samples_v1
data_files:
- split: train
path: web_samples_v1/train-*
- config_name: web_samples_v2
data_files:
- split: train
path: web_samples_v2/train-*
- config_name: wikihow
data_files:
- split: train
path: wikihow/train-*
license: apache-2.0
language:
- en
tags:
- synthetic
- k8s
- kubernetes
size_categories:
- 10K<n<100K
---
# Cosmopedia-kubernetes v0.1
An unmodified subset of Cosmopedia v0.1 data filtered by keywords: `k8s` and `kubernetes`
### Dataset splits
The splits are same as [Cosmopedia v0.1](https://huggingface.co/datasets/HuggingFaceTB/cosmopedia#dataset-splits)
### Dataset features
The dataset has the following features:
- text: the synthetic generated content from Cosmopedia v0.1.
- format: the style of `text`, this can for example be a textbook, a blogpost, a story.. It can also be inferred from the prompt.
- audience: the target audience defined in the prompt
提供机构:
chenhunghan
原始信息汇总
数据集概述
数据集配置
| 配置名称 | 特征名称 | 数据类型 | 训练集大小(字节) | 训练集样本数 | 下载大小(字节) | 数据集大小(字节) |
|---|---|---|---|---|---|---|
| auto_math_text | text | string | 8478.400880047388 | 3 | 16747 | 8478.400880047388 |
| khanacademy | text | string | 0 | 0 | 932 | 0 |
| openstax | text | string | 11689.100077573377 | 3 | 26265 | 11689.100077573377 |
| stanford | text | string | 1355287.8612738524 | 272 | 1060451 | 1355287.8612738524 |
| stories | text | string | 2934322.4468095503 | 1064 | 1946246 | 2934322.4468095503 |
| web_samples_v1 | text | string | 63360415.08870015 | 15691 | 45350020 | 63360415.08870015 |
| web_samples_v2 | text | string | 57739423.97337159 | 14318 | 40353548 | 57739423.97337159 |
| wikihow | text | string | 59943.793823350505 | 13 | 50577 | 59943.793823350505 |
数据集文件路径
| 配置名称 | 训练集路径 |
|---|---|
| auto_math_text | auto_math_text/train-* |
| khanacademy | khanacademy/train-* |
| openstax | openstax/train-* |
| stanford | stanford/train-* |
| stories | stories/train-* |
| web_samples_v1 | web_samples_v1/train-* |
| web_samples_v2 | web_samples_v2/train-* |
| wikihow | wikihow/train-* |
数据集特征
- text: 合成生成的内容。
- format: text的风格,例如教科书、博客文章、故事等。
- audience: 目标受众。



