five

biglam/illustrated_ads

收藏
Hugging Face2023-01-18 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/biglam/illustrated_ads
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - expert-generated language: [] language_creators: [] license: - cc0-1.0 multilinguality: [] pretty_name: 19th Century United States Newspaper Advert images with 'illustrated' or 'non illustrated' labels size_categories: - n<1K source_datasets: [] tags: - lam - historic newspapers task_categories: - image-classification task_ids: - multi-class-image-classification --- The Dataset contains images derived from the [Newspaper Navigator](https://news-navigator.labs.loc.gov/), a dataset of images drawn from the Library of Congress Chronicling America collection (chroniclingamerica.loc.gov/). > [The Newspaper Navigator dataset](https://news-navigator.labs.loc.gov/) consists of extracted visual content for 16,358,041 historic newspaper pages in Chronicling America. The visual content was identified using an object detection model trained on annotations of World War 1-era Chronicling America pages, including annotations made by volunteers as part of the Beyond Words crowdsourcing project. source: https://news-navigator.labs.loc.gov/ One of these categories is 'advertisements'. This dataset contains a sample of these images with additional labels indicating if the advert is 'illustrated' or 'not illustrated'. This dataset was created for use in a [Programming Historian tutorial](http://programminghistorian.github.io/ph-submissions/lessons/computer-vision-deep-learning-pt1). The primary aim of the data was to provide a realistic example dataset for teaching computer vision for working with digitised heritage material. # Dataset Card for 19th Century United States Newspaper Advert images with 'illustrated' or 'non illustrated' labels ## Table of Contents - [Table of Contents](#table-of-contents) - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Homepage:** - **Repository:**[https://doi.org/10.5281/zenodo.5838410](https://doi.org/10.5281/zenodo.5838410) - **Paper:**[https://doi.org/10.46430/phen0101](https://doi.org/10.46430/phen0101) - **Leaderboard:** - **Point of Contact:** ### Dataset Summary The Dataset contains images derived from the [Newspaper Navigator](news-navigator.labs.loc.gov/), a dataset of images drawn from the Library of Congress Chronicling America collection (chroniclingamerica.loc.gov/). > [The Newspaper Navigator dataset](https://news-navigator.labs.loc.gov/) consists of extracted visual content for 16,358,041 historic newspaper pages in Chronicling America. The visual content was identified using an object detection model trained on annotations of World War 1-era Chronicling America pages, including annotations made by volunteers as part of the Beyond Words crowdsourcing project. source: https://news-navigator.labs.loc.gov/ One of these categories is 'advertisements. This dataset contains a sample of these images with additional labels indicating if the advert is 'illustrated' or 'not illustrated'. This dataset was created for use in a [Programming Historian tutorial](http://programminghistorian.github.io/ph-submissions/lessons/computer-vision-deep-learning-pt1). The primary aim of the data was to provide a realistic example dataset for teaching computer vision for working with digitised heritage material. ### Supported Tasks and Leaderboards - `image-classification`: the primary purpose of this dataset is for classifying historic newspaper images identified as being 'advertisements' into 'illustrated' and 'not-illustrated' categories. ### Languages [More Information Needed] ## Dataset Structure ### Data Instances An example instance from this dataset ``` python {'file': 'pst_fenske_ver02_data_sn84026497_00280776129_1880042101_0834_002_6_96.jpg', 'image': <PIL.JpegImagePlugin.JpegImageFile image mode=L size=388x395 at 0x7F9A72038950>, 'label': 0, 'pub_date': Timestamp('1880-04-21 00:00:00'), 'page_seq_num': 834, 'edition_seq_num': 1, 'batch': 'pst_fenske_ver02', 'lccn': 'sn84026497', 'box': [0.649412214756012, 0.6045778393745422, 0.8002520799636841, 0.7152365446090698], 'score': 0.9609346985816956, 'ocr': "H. II. IIASLKT & SOXN, Dealers in General Merchandise In New Store Room nt HASLET'S COS ITERS, 'JTionoMtii, ln. .Tau'y 1st, 1?0.", 'place_of_publication': 'Tionesta, Pa.', 'geographic_coverage': "['Pennsylvania--Forest--Tionesta']", 'name': 'The Forest Republican. [volume]', 'publisher': 'Ed. W. Smiley', 'url': 'https://news-navigator.labs.loc.gov/data/pst_fenske_ver02/data/sn84026497/00280776129/1880042101/0834/002_6_96.jpg', 'page_url': 'https://chroniclingamerica.loc.gov/data/batches/pst_fenske_ver02/data/sn84026497/00280776129/1880042101/0834.jp2'} ``` ### Data Fields [More Information Needed] ### Data Splits The dataset contains a single split. ## Dataset Creation ### Curation Rationale [More Information Needed] ### Source Data #### Initial Data Collection and Normalization [More Information Needed] #### Who are the source language producers? [More Information Needed] ### Annotations #### Annotation process A description of the annotation process is outlined in this [GitHub repository](https://github.com/Living-with-machines/nnanno) [More Information Needed] #### Who are the annotators? [More Information Needed] ### Personal and Sensitive Information [More Information Needed] ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed] ### Discussion of Biases [More Information Needed] ### Other Known Limitations [More Information Needed] ## Additional Information ### Dataset Curators [More Information Needed] ### Licensing Information [More Information Needed] ### Citation Information ``` bibtex @dataset{van_strien_daniel_2021_5838410, author = {van Strien, Daniel}, title = {{19th Century United States Newspaper Advert images with 'illustrated' or 'non illustrated' labels}}, month = oct, year = 2021, publisher = {Zenodo}, version = {0.0.1}, doi = {10.5281/zenodo.5838410}, url = {https://doi.org/10.5281/zenodo.5838410}} ``` [More Information Needed] ### Contributions Thanks to [@davanstrien](https://github.com/davanstrien) for adding this dataset.
提供机构:
biglam
原始信息汇总

数据集概述

数据集名称

  • 名称: 19th Century United States Newspaper Advert images with illustrated or non illustrated labels

数据集属性

  • 语言: 无特定语言信息
  • 许可证: CC0-1.0
  • 多语言性: 不适用
  • 大小: 小于1000个数据实例
  • 标签创建者: 专家生成
  • 任务类别: 图像分类
  • 任务ID: 多类别图像分类

数据集描述

  • 摘要: 该数据集包含从Newspaper Navigator提取的图像,这些图像来自美国国会图书馆的Chronicling America收藏。数据集中的图像被标记为illustrated或non illustrated广告。
  • 用途: 主要用于教学计算机视觉,特别是在处理数字化遗产材料方面。

数据集结构

  • 数据实例: 每个实例包括文件名、图像、标签、发布日期、页面序列号、版本序列号、批次、LCCN、边界框、得分、OCR文本、出版地点、地理覆盖范围、名称、出版商、URL和页面URL。
  • 数据字段: 包括图像、标签等详细信息。
  • 数据分割: 单一分割。

数据集创建

  • 来源数据: 数据源自Newspaper Navigator,该数据集包含从16,358,041个历史报纸页面中提取的视觉内容。
  • 注释: 注释过程描述在GitHub仓库中。

使用考虑

  • 社会影响: 未提供详细信息。
  • 偏见讨论: 未提供详细信息。
  • 其他已知限制: 未提供详细信息。

附加信息

  • 数据集创建者: Daniel van Strien

  • 许可证信息: CC0-1.0

  • 引用信息: bibtex @dataset{van_strien_daniel_2021_5838410, author = {van Strien, Daniel}, title = {{19th Century United States Newspaper Advert images with illustrated or non illustrated labels}}, month = oct, year = 2021, publisher = {Zenodo}, version = {0.0.1}, doi = {10.5281/zenodo.5838410}, url = {https://doi.org/10.5281/zenodo.5838410} }

  • 贡献者: 感谢@davanstrien添加此数据集。

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作