andrewatef/PText
收藏Hugging Face2024-01-29 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/andrewatef/PText
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: articles
features:
- name: input
dtype: string
- name: output
dtype: string
- name: url
dtype: string
splits:
- name: train
num_bytes: 12376328.0
num_examples: 2040
download_size: 5623581
dataset_size: 12376328.0
- config_name: articles2
features:
- name: title
dtype: string
- name: description
dtype: string
- name: reading_time_minutes
dtype: int64
- name: tags
dtype: string
- name: body_markdown
dtype: string
splits:
- name: train
num_bytes: 2567410.0
num_examples: 1090
download_size: 1362235
dataset_size: 2567410.0
- config_name: llama
features:
- name: text
dtype: string
splits:
- name: train
num_bytes: 291896975.0
num_examples: 1257591
download_size: 153320452
dataset_size: 291896975.0
- config_name: llama2
features:
- name: text
dtype: string
splits:
- name: train
num_bytes: 170086868.0
num_examples: 516177
download_size: 83326571
dataset_size: 170086868.0
- config_name: llama3
features:
- name: Instruction
dtype: string
- name: Response
dtype: string
splits:
- name: train
num_bytes: 142729487.0
num_examples: 516177
download_size: 101890981
dataset_size: 142729487.0
- config_name: llama4
features:
- name: text
dtype: string
splits:
- name: train
num_bytes: 157182443.0
num_examples: 516177
download_size: 82734120
dataset_size: 157182443.0
- config_name: llama5
features:
- name: text
dtype: string
splits:
- name: train
num_bytes: 53373019.0
num_examples: 172059
download_size: 27923481
dataset_size: 53373019.0
- config_name: llama6
features:
- name: input
dtype: string
- name: output
dtype: string
- name: instruction
dtype: string
splits:
- name: train
num_bytes: 51480370.0
num_examples: 172059
download_size: 33775616
dataset_size: 51480370.0
- config_name: llama7
features:
- name: input
dtype: string
- name: output
dtype: string
- name: instruction
dtype: string
splits:
- name: train
num_bytes: 3759851.0
num_examples: 13530
download_size: 2287275
dataset_size: 3759851.0
- config_name: llama8
features:
- name: input
dtype: string
- name: chosen
dtype: string
- name: rejected
dtype: string
- name: instruction
dtype: string
splits:
- name: train
num_bytes: 101496004.9890677
num_examples: 120441
- name: test
num_bytes: 43498649.0109323
num_examples: 51618
download_size: 74071830
dataset_size: 144994654.0
- config_name: phi2
features:
- name: text
dtype: string
splits:
- name: train
num_bytes: 275548292.0
num_examples: 1257591
download_size: 151999212
dataset_size: 275548292.0
- config_name: summary
features:
- name: input
dtype: string
- name: output
dtype: string
- name: instruction
dtype: string
splits:
- name: train
num_bytes: 1252702430.0
num_examples: 287113
download_size: 771120161
dataset_size: 1252702430.0
- config_name: summary2
features:
- name: document
dtype: string
- name: summary
dtype: string
- name: input
dtype: string
- name: output
dtype: string
- name: instruction
dtype: string
splits:
- name: train
num_bytes: 1117818826.0
num_examples: 44972
download_size: 648248844
dataset_size: 1117818826.0
configs:
- config_name: articles
data_files:
- split: train
path: articles/train-*
- config_name: articles2
data_files:
- split: train
path: articles2/train-*
- config_name: llama
data_files:
- split: train
path: llama/train-*
- config_name: llama2
data_files:
- split: train
path: llama2/train-*
- config_name: llama3
data_files:
- split: train
path: llama3/train-*
- config_name: llama4
data_files:
- split: train
path: llama4/train-*
- config_name: llama5
data_files:
- split: train
path: llama5/train-*
- config_name: llama6
data_files:
- split: train
path: llama6/train-*
- config_name: llama7
data_files:
- split: train
path: llama7/train-*
- config_name: llama8
data_files:
- split: train
path: llama8/train-*
- split: test
path: llama8/test-*
- config_name: phi2
data_files:
- split: train
path: phi2/train-*
- config_name: summary
data_files:
- split: train
path: summary/train-*
- config_name: summary2
data_files:
- split: train
path: summary2/train-*
---
提供机构:
andrewatef
原始信息汇总
数据集概述
数据集配置
配置名称:articles
- 特征:
input: 字符串output: 字符串url: 字符串
- 分割:
train:- 字节数: 12376328.0
- 样本数: 2040
- 下载大小: 5623581
- 数据集大小: 12376328.0
配置名称:articles2
- 特征:
title: 字符串description: 字符串reading_time_minutes: 整数tags: 字符串body_markdown: 字符串
- 分割:
train:- 字节数: 2567410.0
- 样本数: 1090
- 下载大小: 1362235
- 数据集大小: 2567410.0
配置名称:llama
- 特征:
text: 字符串
- 分割:
train:- 字节数: 291896975.0
- 样本数: 1257591
- 下载大小: 153320452
- 数据集大小: 291896975.0
配置名称:llama2
- 特征:
text: 字符串
- 分割:
train:- 字节数: 170086868.0
- 样本数: 516177
- 下载大小: 83326571
- 数据集大小: 170086868.0
配置名称:llama3
- 特征:
Instruction: 字符串Response: 字符串
- 分割:
train:- 字节数: 142729487.0
- 样本数: 516177
- 下载大小: 101890981
- 数据集大小: 142729487.0
配置名称:llama4
- 特征:
text: 字符串
- 分割:
train:- 字节数: 157182443.0
- 样本数: 516177
- 下载大小: 82734120
- 数据集大小: 157182443.0
配置名称:llama5
- 特征:
text: 字符串
- 分割:
train:- 字节数: 53373019.0
- 样本数: 172059
- 下载大小: 27923481
- 数据集大小: 53373019.0
配置名称:llama6
- 特征:
input: 字符串output: 字符串instruction: 字符串
- 分割:
train:- 字节数: 51480370.0
- 样本数: 172059
- 下载大小: 33775616
- 数据集大小: 51480370.0
配置名称:llama7
- 特征:
input: 字符串output: 字符串instruction: 字符串
- 分割:
train:- 字节数: 3759851.0
- 样本数: 13530
- 下载大小: 2287275
- 数据集大小: 3759851.0
配置名称:llama8
- 特征:
input: 字符串chosen: 字符串rejected: 字符串instruction: 字符串
- 分割:
train:- 字节数: 101496004.9890677
- 样本数: 120441
test:- 字节数: 43498649.0109323
- 样本数: 51618
- 下载大小: 74071830
- 数据集大小: 144994654.0
配置名称:phi2
- 特征:
text: 字符串
- 分割:
train:- 字节数: 275548292.0
- 样本数: 1257591
- 下载大小: 151999212
- 数据集大小: 275548292.0
配置名称:summary
- 特征:
input: 字符串output: 字符串instruction: 字符串
- 分割:
train:- 字节数: 1252702430.0
- 样本数: 287113
- 下载大小: 771120161
- 数据集大小: 1252702430.0
配置名称:summary2
- 特征:
document: 字符串summary: 字符串input: 字符串output: 字符串instruction: 字符串
- 分割:
train:- 字节数: 1117818826.0
- 样本数: 44972
- 下载大小: 648248844
- 数据集大小: 1117818826.0



