sanchit-gandhi/cosmopedia_web_textbooks_logprobs
收藏Hugging Face2024-05-08 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/sanchit-gandhi/cosmopedia_web_textbooks_logprobs
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: prompt
dtype: string
- name: text_token_length
dtype: int64
- name: text
dtype: string
- name: seed_data
dtype: string
- name: format
dtype: string
- name: audience
dtype: string
- name: logprobs
dtype: float64
splits:
- name: train
num_bytes: 29924148869
num_examples: 5000000
download_size: 16394553341
dataset_size: 29924148869
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
The dataset includes multiple features such as prompt, text_token_length, text, seed_data, format, audience, and logprobs. The data types range from string to int64. The dataset is primarily for training, containing 5 million examples. The download size is 16394553341 bytes, and the actual size is 29924148869 bytes.
提供机构:
sanchit-gandhi
原始信息汇总
数据集概述
数据集特征
- prompt:字符串类型
- text_token_length:整数类型(int64)
- text:字符串类型
- seed_data:字符串类型
- format:字符串类型
- audience:字符串类型
- logprobs:浮点数类型(float64)
数据集拆分
- train:
- 数据量:29,924,148,869 字节
- 示例数量:5,000,000
数据集大小
- 下载大小:16,394,553,341 字节
- 数据集大小:29,924,148,869 字节
配置
- config_name: default
- data_files:
- split: train
- path: data/train-*



