five

davanstrien/autogenerated-dataset-card

收藏
Hugging Face2023-02-15 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/davanstrien/autogenerated-dataset-card
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: file dtype: string - name: image dtype: image - name: label dtype: class_label: names: '0': text-only '1': illustrations - name: pub_date dtype: timestamp[ns] - name: page_seq_num dtype: int64 - name: edition_seq_num dtype: int64 - name: batch dtype: string - name: lccn dtype: string - name: box sequence: float32 - name: score dtype: float64 - name: ocr dtype: string - name: place_of_publication dtype: string - name: geographic_coverage dtype: string - name: name dtype: string - name: publisher dtype: string - name: url dtype: string - name: page_url dtype: string splits: - name: train num_bytes: 48233952 num_examples: 549 download_size: 48027719 dataset_size: 48233952 size_categories: - n<1K --- # Dataset Card for "test_dataset_cogapp" [More Information needed](https://github.com/huggingface/datasets/blob/main/CONTRIBUTING.md#how-to-contribute-to-the-dataset-cards) The first row of the dataset looks like <!-- [[[cog from datasets import load_dataset import json ds = load_dataset("davanstrien/test_dataset_cogapp") data = ds['train'][0] example = json.dumps({k: str(v) for k,v in data.items()}, indent=2) cog.out( "```\n{}\n```".format(example)) )]]] --> ``` { "file": "pst_fenske_ver02_data_sn84026497_00280776129_1880042101_0834_002_6_96.jpg", "image": "<PIL.JpegImagePlugin.JpegImageFile image mode=L size=388x395 at 0x11AF00990>", "label": "0", "pub_date": "1880-04-21 00:00:00", "page_seq_num": "834", "edition_seq_num": "1", "batch": "pst_fenske_ver02", "lccn": "sn84026497", "box": "[0.649412214756012, 0.6045778393745422, 0.8002520799636841, 0.7152365446090698]", "score": "0.9609346985816956", "ocr": "H. II. IIASLKT & SOXN, Dealers in General Merchandise In New Store Room nt HASLET'S COS ITERS, 'JTionoMtii, ln. .Tau'y 1st, 1?0.", "place_of_publication": "Tionesta, Pa.", "geographic_coverage": "['Pennsylvania--Forest--Tionesta']", "name": "The Forest Republican. [volume]", "publisher": "Ed. W. Smiley", "url": "https://news-navigator.labs.loc.gov/data/pst_fenske_ver02/data/sn84026497/00280776129/1880042101/0834/002_6_96.jpg", "page_url": "https://chroniclingamerica.loc.gov/data/batches/pst_fenske_ver02/data/sn84026497/00280776129/1880042101/0834.jp2" } ``` <!-- [[[end]]] --> <!-- [[[cog from auto_dataset_card.core import generate_label_breakdown_tables, get_label_counts ds = load_dataset("davanstrien/test_dataset_cogapp") data = get_label_counts(ds) cog.out( f""" # Label breakdowns \n ``` {data} ``` """) ]]] --> # Label breakdowns ``` {'train': {'text-only': 376, 'illustrations': 173}} ``` <!-- [[[end]]] --> <!-- [[[cog from auto_dataset_card.core import generate_label_breakdown_tables, get_label_counts ds = load_dataset("davanstrien/test_dataset_cogapp") data = get_label_counts(ds) tables = generate_label_breakdown_tables(data) split = tables[0][0] table = tables[0][1] cog.out( f""" # Label breakdown table for split: {split} \n {table} """) ]]] --> # Label breakdown table for split: train | Label | Count | Percentage | |---------------|---------|--------------| | text-only | 376 | 68.49% | | illustrations | 173 | 31.51% | <!-- [[[end]]] -->
提供机构:
davanstrien
原始信息汇总

数据集概述

数据集特征

  • file: 字符串类型
  • image: 图像类型
  • label: 分类标签,包含两个类别:
    • 0: text-only
    • 1: illustrations
  • pub_date: 时间戳类型,单位为纳秒
  • page_seq_num: 整数类型
  • edition_seq_num: 整数类型
  • batch: 字符串类型
  • lccn: 字符串类型
  • box: 序列类型,浮点数
  • score: 浮点数类型
  • ocr: 字符串类型
  • place_of_publication: 字符串类型
  • geographic_coverage: 字符串类型
  • name: 字符串类型
  • publisher: 字符串类型
  • url: 字符串类型
  • page_url: 字符串类型

数据集划分

  • train:
    • 数据量: 48233952 字节
    • 样本数: 549

标签分布

  • train 数据集的标签分布如下:
    • text-only: 376 (68.49%)
    • illustrations: 173 (31.51%)
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作