davanstrien/blbooks
收藏Hugging Face2023-11-28 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/davanstrien/blbooks
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: '1500_1899'
features:
- name: record_id
dtype: string
- name: date
dtype: timestamp[s]
- name: raw_date
dtype: string
- name: title
dtype: string
- name: place
dtype: string
- name: empty_pg
dtype: bool
- name: text
dtype: string
- name: pg
dtype: int32
- name: mean_wc_ocr
dtype: float32
- name: std_wc_ocr
dtype: float64
- name: name
dtype: string
- name: all_names
dtype: string
- name: Publisher
dtype: string
- name: Country of publication 1
dtype: string
- name: all Countries of publication
dtype: string
- name: Physical description
dtype: string
- name: Language_1
dtype: string
- name: Language_2
dtype: string
- name: Language_3
dtype: string
- name: Language_4
dtype: string
- name: multi_language
dtype: bool
splits:
- name: train
num_bytes: 30447672419
num_examples: 14011953
download_size: 16418278808
dataset_size: 30447672419
- config_name: '1510_1699'
features:
- name: record_id
dtype: string
- name: date
dtype: timestamp[s]
- name: raw_date
dtype: string
- name: title
dtype: string
- name: place
dtype: string
- name: empty_pg
dtype: bool
- name: text
dtype: string
- name: pg
dtype: int32
- name: mean_wc_ocr
dtype: float32
- name: std_wc_ocr
dtype: float64
- name: name
dtype: string
- name: all_names
dtype: string
- name: Publisher
dtype: string
- name: Country of publication 1
dtype: string
- name: all Countries of publication
dtype: string
- name: Physical description
dtype: string
- name: Language_1
dtype: string
- name: Language_2
dtype: string
- name: Language_3
dtype: string
- name: Language_4
dtype: string
- name: multi_language
dtype: bool
splits:
- name: train
num_bytes: 107654867
num_examples: 51982
download_size: 64550493
dataset_size: 107654867
- config_name: '1700_1799'
features:
- name: record_id
dtype: string
- name: date
dtype: timestamp[s]
- name: raw_date
dtype: string
- name: title
dtype: string
- name: place
dtype: string
- name: empty_pg
dtype: bool
- name: text
dtype: string
- name: pg
dtype: int32
- name: mean_wc_ocr
dtype: float32
- name: std_wc_ocr
dtype: float64
- name: name
dtype: string
- name: all_names
dtype: string
- name: Publisher
dtype: string
- name: Country of publication 1
dtype: string
- name: all Countries of publication
dtype: string
- name: Physical description
dtype: string
- name: Language_1
dtype: string
- name: Language_2
dtype: string
- name: Language_3
dtype: string
- name: Language_4
dtype: string
- name: multi_language
dtype: bool
splits:
- name: train
num_bytes: 267068570
num_examples: 178224
download_size: 143916194
dataset_size: 267068570
- config_name: '1800_1899'
features:
- name: record_id
dtype: string
- name: date
dtype: timestamp[s]
- name: raw_date
dtype: string
- name: title
dtype: string
- name: place
dtype: string
- name: empty_pg
dtype: bool
- name: text
dtype: string
- name: pg
dtype: int32
- name: mean_wc_ocr
dtype: float32
- name: std_wc_ocr
dtype: float64
- name: name
dtype: string
- name: all_names
dtype: string
- name: Publisher
dtype: string
- name: Country of publication 1
dtype: string
- name: all Countries of publication
dtype: string
- name: Physical description
dtype: string
- name: Language_1
dtype: string
- name: Language_2
dtype: string
- name: Language_3
dtype: string
- name: Language_4
dtype: string
- name: multi_language
dtype: bool
splits:
- name: train
num_bytes: 30072947637
num_examples: 13781747
download_size: 16208823069
dataset_size: 30072947637
configs:
- config_name: '1500_1899'
data_files:
- split: train
path: 1500_1899/train-*
- config_name: '1510_1699'
data_files:
- split: train
path: 1510_1699/train-*
- config_name: '1700_1799'
data_files:
- split: train
path: 1700_1799/train-*
- config_name: '1800_1899'
data_files:
- split: train
path: 1800_1899/train-*
---
提供机构:
davanstrien
原始信息汇总
数据集概述
数据集配置
配置 1500_1899
- 特征:
record_id: stringdate: timestamp[s]raw_date: stringtitle: stringplace: stringempty_pg: booltext: stringpg: int32mean_wc_ocr: float32std_wc_ocr: float64name: stringall_names: stringPublisher: stringCountry of publication 1: stringall Countries of publication: stringPhysical description: stringLanguage_1: stringLanguage_2: stringLanguage_3: stringLanguage_4: stringmulti_language: bool
- 分割:
train:num_bytes: 30447672419num_examples: 14011953
- 下载大小: 16418278808
- 数据集大小: 30447672419
配置 1510_1699
- 特征:
record_id: stringdate: timestamp[s]raw_date: stringtitle: stringplace: stringempty_pg: booltext: stringpg: int32mean_wc_ocr: float32std_wc_ocr: float64name: stringall_names: stringPublisher: stringCountry of publication 1: stringall Countries of publication: stringPhysical description: stringLanguage_1: stringLanguage_2: stringLanguage_3: stringLanguage_4: stringmulti_language: bool
- 分割:
train:num_bytes: 107654867num_examples: 51982
- 下载大小: 64550493
- 数据集大小: 107654867
配置 1700_1799
- 特征:
record_id: stringdate: timestamp[s]raw_date: stringtitle: stringplace: stringempty_pg: booltext: stringpg: int32mean_wc_ocr: float32std_wc_ocr: float64name: stringall_names: stringPublisher: stringCountry of publication 1: stringall Countries of publication: stringPhysical description: stringLanguage_1: stringLanguage_2: stringLanguage_3: stringLanguage_4: stringmulti_language: bool
- 分割:
train:num_bytes: 267068570num_examples: 178224
- 下载大小: 143916194
- 数据集大小: 267068570
配置 1800_1899
- 特征:
record_id: stringdate: timestamp[s]raw_date: stringtitle: stringplace: stringempty_pg: booltext: stringpg: int32mean_wc_ocr: float32std_wc_ocr: float64name: stringall_names: stringPublisher: stringCountry of publication 1: stringall Countries of publication: stringPhysical description: stringLanguage_1: stringLanguage_2: stringLanguage_3: stringLanguage_4: stringmulti_language: bool
- 分割:
train:num_bytes: 30072947637num_examples: 13781747
- 下载大小: 16208823069
- 数据集大小: 30072947637
数据文件路径
- 配置 1500_1899:
train: 1500_1899/train-*
- 配置 1510_1699:
train: 1510_1699/train-*
- 配置 1700_1799:
train: 1700_1799/train-*
- 配置 1800_1899:
train: 1800_1899/train-*



