davanstrien/ai4lam-demo
收藏Hugging Face2022-11-14 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/davanstrien/ai4lam-demo
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: record_id
dtype: string
- name: date
dtype: timestamp[ns]
- name: raw_date
dtype: string
- name: title
dtype: string
- name: place
dtype: string
- name: empty_pg
dtype: bool
- name: text
dtype: string
- name: pg
dtype: int64
- name: mean_wc_ocr
dtype: float64
- name: std_wc_ocr
dtype: float64
- name: name
dtype: string
- name: all_names
dtype: string
- name: Publisher
dtype: string
- name: Country of publication 1
dtype: string
- name: all Countries of publication
dtype: string
- name: Physical description
dtype: string
- name: Language_1
dtype: string
- name: Language_2
dtype: string
- name: Language_3
dtype: 'null'
- name: Language_4
dtype: 'null'
- name: multi_language
dtype: bool
splits:
- name: train
num_bytes: 5300866
num_examples: 4148
download_size: 2857751
dataset_size: 5300866
---
# Dataset Card for "ai4lam-demo"
[More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards)
提供机构:
davanstrien
原始信息汇总
数据集信息
特征
- record_id: 字符串类型
- date: 时间戳类型
- raw_date: 字符串类型
- title: 字符串类型
- place: 字符串类型
- empty_pg: 布尔类型
- text: 字符串类型
- pg: 整数类型
- mean_wc_ocr: 浮点数类型
- std_wc_ocr: 浮点数类型
- name: 字符串类型
- all_names: 字符串类型
- Publisher: 字符串类型
- Country of publication 1: 字符串类型
- all Countries of publication: 字符串类型
- Physical description: 字符串类型
- Language_1: 字符串类型
- Language_2: 字符串类型
- Language_3: 空值类型
- Language_4: 空值类型
- multi_language: 布尔类型
数据分割
- train:
- 字节数: 5300866
- 样本数: 4148
数据集大小
- 下载大小: 2857751 字节
- 数据集大小: 5300866 字节



