five

severo/embellishments

收藏
Hugging Face2022-10-25 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/severo/embellishments
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - no-annotation license: - cc0-1.0 size_categories: - n<1K source_datasets: - original pretty_name: Digitised Books - Images identified as Embellishments. c. 1510 - c. 1900. JPG --- # Dataset Card for severo/embellishments ## Dataset Description - **Homepage:** [Digitised Books - Images identified as Embellishments - Homepage](https://bl.iro.bl.uk/concern/datasets/59d1aa35-c2d7-46e5-9475-9d0cd8df721e) - **Point of Contact:** [Sylvain Lesage](mailto:sylvain.lesage@huggingface.co) ### Dataset Summary This small dataset contains the thumbnails of the first 100 entries of [Digitised Books - Images identified as Embellishments. c. 1510 - c. 1900. JPG](https://bl.iro.bl.uk/concern/datasets/59d1aa35-c2d7-46e5-9475-9d0cd8df721e). It has been uploaded to the Hub to reproduce the tutorial by Daniel van Strien: [Using 🤗 datasets for image search](https://danielvanstrien.xyz/metadata/deployment/huggingface/ethics/huggingface-datasets/faiss/2022/01/13/image_search.html). ## Dataset Structure ### Data Instances A typical row contains an image thumbnail, its filename, and the year of publication of the book it was extracted from. An example looks as follows: ``` { 'fname': '000811462_05_000205_1_The Pictorial History of England being a history of the people as well as a hi_1855.jpg', 'year': '1855', 'path': 'embellishments/1855/000811462_05_000205_1_The Pictorial History of England being a history of the people as well as a hi_1855.jpg', 'img': ... } ``` ### Data Fields - `fname`: the image filename. - `year`: a string with the year of publication of the book from which the image has been extracted - `path`: local path to the image - `img`: a thumbnail of the image with a max height and width of 224 pixels ### Data Splits The dataset only contains 100 rows, in a single 'train' split. ## Dataset Creation ### Curation Rationale This dataset was chosen by Daniel van Strien for his tutorial [Using 🤗 datasets for image search](https://danielvanstrien.xyz/metadata/deployment/huggingface/ethics/huggingface-datasets/faiss/2022/01/13/image_search.html), which includes the code in Python to do it. ### Source Data #### Initial Data Collection and Normalization As stated on the British Library webpage: > The images were algorithmically gathered from 49,455 digitised books, equating to 65,227 volumes (25+ million pages), published between c. 1510 - c. 1900. The books cover a wide range of subject areas including philosophy, history, poetry and literature. The images are in .JPEG format.d BCP-47 code is `en`. #### Who are the source data producers? British Library, British Library Labs, Adrian Edwards (Curator), Neil Fitzgerald (Contributor ORCID) ### Annotations The dataset does not contain any additional annotations. #### Annotation process [N/A] #### Who are the annotators? [N/A] ### Personal and Sensitive Information [N/A] ## Considerations for Using the Data ### Social Impact of Dataset [N/A] ### Discussion of Biases [N/A] ### Other Known Limitations This is a toy dataset that aims at: - validating the process described in the tutorial [Using 🤗 datasets for image search](https://danielvanstrien.xyz/metadata/deployment/huggingface/ethics/huggingface-datasets/faiss/2022/01/13/image_search.html) by Daniel van Strien, - showing the [dataset viewer](https://huggingface.co/datasets/severo/embellishments/viewer/severo--embellishments/train) on an image dataset. ## Additional Information ### Dataset Curators The dataset was created by Sylvain Lesage at Hugging Face, to replicate the tutorial [Using 🤗 datasets for image search](https://danielvanstrien.xyz/metadata/deployment/huggingface/ethics/huggingface-datasets/faiss/2022/01/13/image_search.html) by Daniel van Strien. ### Licensing Information CC0 1.0 Universal Public Domain
提供机构:
severo
原始信息汇总

数据集概述

数据集名称

  • 名称: Digitised Books - Images identified as Embellishments. c. 1510 - c. 1900. JPG
  • 别名: severo/embellishments

数据集描述

数据集摘要

  • 内容: 包含100个图像缩略图,这些图像来自1510年至1900年间出版的书籍,被标识为装饰性图像。
  • 用途: 用于复制Daniel van Strien的教程Using 🤗 datasets for image search

数据集结构

数据实例

  • 组成: 每个实例包含图像缩略图、文件名、以及图像来源书籍的出版年份。

  • 示例:

    { fname: 000811462_05_000205_1_The Pictorial History of England being a history of the people as well as a hi_1855.jpg, year: 1855, path: embellishments/1855/000811462_05_000205_1_The Pictorial History of England being a history of the people as well as a hi_1855.jpg, img: ... }

数据字段

  • fname: 图像文件名。
  • year: 字符串,表示图像来源书籍的出版年份。
  • path: 图像的本地路径。
  • img: 图像缩略图,最大高度和宽度为224像素。

数据分割

  • 分割方式: 单一的train分割,共100行。

数据集创建

源数据

  • 来源: 从49,455本数字化书籍中算法收集,涵盖1510年至1900年间出版的书籍。
  • 格式: JPEG格式。
  • 数据生产者: 英国图书馆、英国图书馆实验室、Adrian Edwards (策展人)、Neil Fitzgerald (贡献者ORCID)。

数据集创建者

  • 创建者: Sylvain Lesage at Hugging Face

许可证

  • 许可证: CC0 1.0 Universal Public Domain
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作