skeskinen/books3_basic_paragraphs
收藏Hugging Face2023-06-14 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/skeskinen/books3_basic_paragraphs
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: text
dtype: string
- name: book
dtype: string
- name: pos
dtype: float64
- name: smog_index
dtype: float64
splits:
- name: train
num_bytes: 1366299770
num_examples: 6639751
download_size: 676098743
dataset_size: 1366299770
---
# Dataset Card for "books3_basic_paragraphs"
the_pile books3, books with smog grade difficulty estimate of 6.5 or under. Split into paragraphs and filtered out most 'non-paragraphs' like titles, tables of content, etc.
提供机构:
skeskinen
原始信息汇总
数据集概述
数据集名称
- 名称: books3_basic_paragraphs
数据集特征
- 特征列表:
text: 数据类型为stringbook: 数据类型为stringpos: 数据类型为float64smog_index: 数据类型为float64
数据集划分
- 训练集:
- 样本数量: 6639751
- 数据大小: 1366299770 字节
数据集大小
- 下载大小: 676098743 字节
- 总数据大小: 1366299770 字节



