davanstrien/autogenerated-dataset-card

Name: davanstrien/autogenerated-dataset-card
Creator: davanstrien
Published: 2023-02-15 10:35:21
License: 暂无描述

Hugging Face2023-02-15 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/davanstrien/autogenerated-dataset-card

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: file dtype: string - name: image dtype: image - name: label dtype: class_label: names: '0': text-only '1': illustrations - name: pub_date dtype: timestamp[ns] - name: page_seq_num dtype: int64 - name: edition_seq_num dtype: int64 - name: batch dtype: string - name: lccn dtype: string - name: box sequence: float32 - name: score dtype: float64 - name: ocr dtype: string - name: place_of_publication dtype: string - name: geographic_coverage dtype: string - name: name dtype: string - name: publisher dtype: string - name: url dtype: string - name: page_url dtype: string splits: - name: train num_bytes: 48233952 num_examples: 549 download_size: 48027719 dataset_size: 48233952 size_categories: - n<1K --- # Dataset Card for "test_dataset_cogapp" [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) The first row of the dataset looks like  ``` { "file": "pst_fenske_ver02_data_sn84026497_00280776129_1880042101_0834_002_6_96.jpg", "image": "<PIL.JpegImagePlugin.JpegImageFile image mode=L size=388x395 at 0x11AF00990>", "label": "0", "pub_date": "1880-04-21 00:00:00", "page_seq_num": "834", "edition_seq_num": "1", "batch": "pst_fenske_ver02", "lccn": "sn84026497", "box": "[0.649412214756012, 0.6045778393745422, 0.8002520799636841, 0.7152365446090698]", "score": "0.9609346985816956", "ocr": "H. II. IIASLKT & SOXN, Dealers in General Merchandise In New Store Room nt HASLET'S COS ITERS, 'JTionoMtii, ln. .Tau'y 1st, 1?0.", "place_of_publication": "Tionesta, Pa.", "geographic_coverage": "['Pennsylvania--Forest--Tionesta']", "name": "The Forest Republican. [volume]", "publisher": "Ed. W. Smiley", "url": "https://news-navigator.labs.loc.gov/data/pst_fenske_ver02/data/sn84026497/00280776129/1880042101/0834/002_6_96.jpg", "page_url": "https://chroniclingamerica.loc.gov/data/batches/pst_fenske_ver02/data/sn84026497/00280776129/1880042101/0834.jp2" } ```   # Label breakdowns ``` {'train': {'text-only': 376, 'illustrations': 173}} ```   # Label breakdown table for split: train | Label | Count | Percentage | |---------------|---------|--------------| | text-only | 376 | 68.49% | | illustrations | 173 | 31.51% |

提供机构：

davanstrien

原始信息汇总

数据集概述

数据集特征

file: 字符串类型
image: 图像类型
label: 分类标签，包含两个类别：
- 0: text-only
- 1: illustrations
pub_date: 时间戳类型，单位为纳秒
page_seq_num: 整数类型
edition_seq_num: 整数类型
batch: 字符串类型
lccn: 字符串类型
box: 序列类型，浮点数
score: 浮点数类型
ocr: 字符串类型
place_of_publication: 字符串类型
geographic_coverage: 字符串类型
name: 字符串类型
publisher: 字符串类型
url: 字符串类型
page_url: 字符串类型

数据集划分

train:
- 数据量: 48233952 字节
- 样本数: 549

标签分布

train 数据集的标签分布如下：
- text-only: 376 (68.49%)
- illustrations: 173 (31.51%)

5,000+

优质数据集

54 个

任务类型

进入经典数据集