five

davanstrien/blbooks

收藏
Hugging Face2023-11-28 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/davanstrien/blbooks
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: '1500_1899' features: - name: record_id dtype: string - name: date dtype: timestamp[s] - name: raw_date dtype: string - name: title dtype: string - name: place dtype: string - name: empty_pg dtype: bool - name: text dtype: string - name: pg dtype: int32 - name: mean_wc_ocr dtype: float32 - name: std_wc_ocr dtype: float64 - name: name dtype: string - name: all_names dtype: string - name: Publisher dtype: string - name: Country of publication 1 dtype: string - name: all Countries of publication dtype: string - name: Physical description dtype: string - name: Language_1 dtype: string - name: Language_2 dtype: string - name: Language_3 dtype: string - name: Language_4 dtype: string - name: multi_language dtype: bool splits: - name: train num_bytes: 30447672419 num_examples: 14011953 download_size: 16418278808 dataset_size: 30447672419 - config_name: '1510_1699' features: - name: record_id dtype: string - name: date dtype: timestamp[s] - name: raw_date dtype: string - name: title dtype: string - name: place dtype: string - name: empty_pg dtype: bool - name: text dtype: string - name: pg dtype: int32 - name: mean_wc_ocr dtype: float32 - name: std_wc_ocr dtype: float64 - name: name dtype: string - name: all_names dtype: string - name: Publisher dtype: string - name: Country of publication 1 dtype: string - name: all Countries of publication dtype: string - name: Physical description dtype: string - name: Language_1 dtype: string - name: Language_2 dtype: string - name: Language_3 dtype: string - name: Language_4 dtype: string - name: multi_language dtype: bool splits: - name: train num_bytes: 107654867 num_examples: 51982 download_size: 64550493 dataset_size: 107654867 - config_name: '1700_1799' features: - name: record_id dtype: string - name: date dtype: timestamp[s] - name: raw_date dtype: string - name: title dtype: string - name: place dtype: string - name: empty_pg dtype: bool - name: text dtype: string - name: pg dtype: int32 - name: mean_wc_ocr dtype: float32 - name: std_wc_ocr dtype: float64 - name: name dtype: string - name: all_names dtype: string - name: Publisher dtype: string - name: Country of publication 1 dtype: string - name: all Countries of publication dtype: string - name: Physical description dtype: string - name: Language_1 dtype: string - name: Language_2 dtype: string - name: Language_3 dtype: string - name: Language_4 dtype: string - name: multi_language dtype: bool splits: - name: train num_bytes: 267068570 num_examples: 178224 download_size: 143916194 dataset_size: 267068570 - config_name: '1800_1899' features: - name: record_id dtype: string - name: date dtype: timestamp[s] - name: raw_date dtype: string - name: title dtype: string - name: place dtype: string - name: empty_pg dtype: bool - name: text dtype: string - name: pg dtype: int32 - name: mean_wc_ocr dtype: float32 - name: std_wc_ocr dtype: float64 - name: name dtype: string - name: all_names dtype: string - name: Publisher dtype: string - name: Country of publication 1 dtype: string - name: all Countries of publication dtype: string - name: Physical description dtype: string - name: Language_1 dtype: string - name: Language_2 dtype: string - name: Language_3 dtype: string - name: Language_4 dtype: string - name: multi_language dtype: bool splits: - name: train num_bytes: 30072947637 num_examples: 13781747 download_size: 16208823069 dataset_size: 30072947637 configs: - config_name: '1500_1899' data_files: - split: train path: 1500_1899/train-* - config_name: '1510_1699' data_files: - split: train path: 1510_1699/train-* - config_name: '1700_1799' data_files: - split: train path: 1700_1799/train-* - config_name: '1800_1899' data_files: - split: train path: 1800_1899/train-* ---
提供机构:
davanstrien
原始信息汇总

数据集概述

数据集配置

配置 1500_1899

  • 特征:
    • record_id: string
    • date: timestamp[s]
    • raw_date: string
    • title: string
    • place: string
    • empty_pg: bool
    • text: string
    • pg: int32
    • mean_wc_ocr: float32
    • std_wc_ocr: float64
    • name: string
    • all_names: string
    • Publisher: string
    • Country of publication 1: string
    • all Countries of publication: string
    • Physical description: string
    • Language_1: string
    • Language_2: string
    • Language_3: string
    • Language_4: string
    • multi_language: bool
  • 分割:
    • train:
      • num_bytes: 30447672419
      • num_examples: 14011953
  • 下载大小: 16418278808
  • 数据集大小: 30447672419

配置 1510_1699

  • 特征:
    • record_id: string
    • date: timestamp[s]
    • raw_date: string
    • title: string
    • place: string
    • empty_pg: bool
    • text: string
    • pg: int32
    • mean_wc_ocr: float32
    • std_wc_ocr: float64
    • name: string
    • all_names: string
    • Publisher: string
    • Country of publication 1: string
    • all Countries of publication: string
    • Physical description: string
    • Language_1: string
    • Language_2: string
    • Language_3: string
    • Language_4: string
    • multi_language: bool
  • 分割:
    • train:
      • num_bytes: 107654867
      • num_examples: 51982
  • 下载大小: 64550493
  • 数据集大小: 107654867

配置 1700_1799

  • 特征:
    • record_id: string
    • date: timestamp[s]
    • raw_date: string
    • title: string
    • place: string
    • empty_pg: bool
    • text: string
    • pg: int32
    • mean_wc_ocr: float32
    • std_wc_ocr: float64
    • name: string
    • all_names: string
    • Publisher: string
    • Country of publication 1: string
    • all Countries of publication: string
    • Physical description: string
    • Language_1: string
    • Language_2: string
    • Language_3: string
    • Language_4: string
    • multi_language: bool
  • 分割:
    • train:
      • num_bytes: 267068570
      • num_examples: 178224
  • 下载大小: 143916194
  • 数据集大小: 267068570

配置 1800_1899

  • 特征:
    • record_id: string
    • date: timestamp[s]
    • raw_date: string
    • title: string
    • place: string
    • empty_pg: bool
    • text: string
    • pg: int32
    • mean_wc_ocr: float32
    • std_wc_ocr: float64
    • name: string
    • all_names: string
    • Publisher: string
    • Country of publication 1: string
    • all Countries of publication: string
    • Physical description: string
    • Language_1: string
    • Language_2: string
    • Language_3: string
    • Language_4: string
    • multi_language: bool
  • 分割:
    • train:
      • num_bytes: 30072947637
      • num_examples: 13781747
  • 下载大小: 16208823069
  • 数据集大小: 30072947637

数据文件路径

  • 配置 1500_1899:
    • train: 1500_1899/train-*
  • 配置 1510_1699:
    • train: 1510_1699/train-*
  • 配置 1700_1799:
    • train: 1700_1799/train-*
  • 配置 1800_1899:
    • train: 1800_1899/train-*
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作