five

chenhunghan/cosmopedia-kubernetes

收藏
Hugging Face2024-03-12 更新2024-06-11 收录
下载链接:
https://hf-mirror.com/datasets/chenhunghan/cosmopedia-kubernetes
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: auto_math_text features: - name: text dtype: string - name: format dtype: string - name: audience dtype: string splits: - name: train num_bytes: 8478.400880047388 num_examples: 3 download_size: 16747 dataset_size: 8478.400880047388 - config_name: khanacademy features: - name: text dtype: string - name: format dtype: string - name: audience dtype: string splits: - name: train num_bytes: 0 num_examples: 0 download_size: 932 dataset_size: 0 - config_name: openstax features: - name: text dtype: string - name: format dtype: string - name: audience dtype: string splits: - name: train num_bytes: 11689.100077573377 num_examples: 3 download_size: 26265 dataset_size: 11689.100077573377 - config_name: stanford features: - name: text dtype: string - name: format dtype: string - name: audience dtype: string splits: - name: train num_bytes: 1355287.8612738524 num_examples: 272 download_size: 1060451 dataset_size: 1355287.8612738524 - config_name: stories features: - name: text dtype: string - name: format dtype: string - name: audience dtype: string splits: - name: train num_bytes: 2934322.4468095503 num_examples: 1064 download_size: 1946246 dataset_size: 2934322.4468095503 - config_name: web_samples_v1 features: - name: text dtype: string - name: format dtype: string - name: audience dtype: string splits: - name: train num_bytes: 63360415.08870015 num_examples: 15691 download_size: 45350020 dataset_size: 63360415.08870015 - config_name: web_samples_v2 features: - name: text dtype: string - name: format dtype: string - name: audience dtype: string splits: - name: train num_bytes: 57739423.97337159 num_examples: 14318 download_size: 40353548 dataset_size: 57739423.97337159 - config_name: wikihow features: - name: text dtype: string - name: format dtype: string - name: audience dtype: string splits: - name: train num_bytes: 59943.793823350505 num_examples: 13 download_size: 50577 dataset_size: 59943.793823350505 configs: - config_name: auto_math_text data_files: - split: train path: auto_math_text/train-* - config_name: khanacademy data_files: - split: train path: khanacademy/train-* - config_name: openstax data_files: - split: train path: openstax/train-* - config_name: stanford data_files: - split: train path: stanford/train-* - config_name: stories data_files: - split: train path: stories/train-* - config_name: web_samples_v1 data_files: - split: train path: web_samples_v1/train-* - config_name: web_samples_v2 data_files: - split: train path: web_samples_v2/train-* - config_name: wikihow data_files: - split: train path: wikihow/train-* license: apache-2.0 language: - en tags: - synthetic - k8s - kubernetes size_categories: - 10K<n<100K --- # Cosmopedia-kubernetes v0.1 An unmodified subset of Cosmopedia v0.1 data filtered by keywords: `k8s` and `kubernetes` ### Dataset splits The splits are same as [Cosmopedia v0.1](https://huggingface.co/datasets/HuggingFaceTB/cosmopedia#dataset-splits) ### Dataset features The dataset has the following features: - text: the synthetic generated content from Cosmopedia v0.1. - format: the style of `text`, this can for example be a textbook, a blogpost, a story.. It can also be inferred from the prompt. - audience: the target audience defined in the prompt
提供机构:
chenhunghan
原始信息汇总

数据集概述

数据集配置

配置名称 特征名称 数据类型 训练集大小(字节) 训练集样本数 下载大小(字节) 数据集大小(字节)
auto_math_text text string 8478.400880047388 3 16747 8478.400880047388
khanacademy text string 0 0 932 0
openstax text string 11689.100077573377 3 26265 11689.100077573377
stanford text string 1355287.8612738524 272 1060451 1355287.8612738524
stories text string 2934322.4468095503 1064 1946246 2934322.4468095503
web_samples_v1 text string 63360415.08870015 15691 45350020 63360415.08870015
web_samples_v2 text string 57739423.97337159 14318 40353548 57739423.97337159
wikihow text string 59943.793823350505 13 50577 59943.793823350505

数据集文件路径

配置名称 训练集路径
auto_math_text auto_math_text/train-*
khanacademy khanacademy/train-*
openstax openstax/train-*
stanford stanford/train-*
stories stories/train-*
web_samples_v1 web_samples_v1/train-*
web_samples_v2 web_samples_v2/train-*
wikihow wikihow/train-*

数据集特征

  • text: 合成生成的内容。
  • format: text的风格,例如教科书、博客文章、故事等。
  • audience: 目标受众。
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作