severo/embellishments
收藏Hugging Face2022-10-25 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/severo/embellishments
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- no-annotation
license:
- cc0-1.0
size_categories:
- n<1K
source_datasets:
- original
pretty_name: Digitised Books - Images identified as Embellishments. c. 1510 - c. 1900.
JPG
---
# Dataset Card for severo/embellishments
## Dataset Description
- **Homepage:** [Digitised Books - Images identified as Embellishments - Homepage](https://bl.iro.bl.uk/concern/datasets/59d1aa35-c2d7-46e5-9475-9d0cd8df721e)
- **Point of Contact:** [Sylvain Lesage](mailto:sylvain.lesage@huggingface.co)
### Dataset Summary
This small dataset contains the thumbnails of the first 100 entries of [Digitised Books - Images identified as Embellishments. c. 1510 - c. 1900. JPG](https://bl.iro.bl.uk/concern/datasets/59d1aa35-c2d7-46e5-9475-9d0cd8df721e). It has been uploaded to the Hub to reproduce the tutorial by Daniel van Strien: [Using 🤗 datasets for image search](https://danielvanstrien.xyz/metadata/deployment/huggingface/ethics/huggingface-datasets/faiss/2022/01/13/image_search.html).
## Dataset Structure
### Data Instances
A typical row contains an image thumbnail, its filename, and the year of publication of the book it was extracted from.
An example looks as follows:
```
{
'fname': '000811462_05_000205_1_The Pictorial History of England being a history of the people as well as a hi_1855.jpg',
'year': '1855',
'path': 'embellishments/1855/000811462_05_000205_1_The Pictorial History of England being a history of the people as well as a hi_1855.jpg',
'img': ...
}
```
### Data Fields
- `fname`: the image filename.
- `year`: a string with the year of publication of the book from which the image has been extracted
- `path`: local path to the image
- `img`: a thumbnail of the image with a max height and width of 224 pixels
### Data Splits
The dataset only contains 100 rows, in a single 'train' split.
## Dataset Creation
### Curation Rationale
This dataset was chosen by Daniel van Strien for his tutorial [Using 🤗 datasets for image search](https://danielvanstrien.xyz/metadata/deployment/huggingface/ethics/huggingface-datasets/faiss/2022/01/13/image_search.html), which includes the code in Python to do it.
### Source Data
#### Initial Data Collection and Normalization
As stated on the British Library webpage:
> The images were algorithmically gathered from 49,455 digitised books, equating to 65,227 volumes (25+ million pages), published between c. 1510 - c. 1900. The books cover a wide range of subject areas including philosophy, history, poetry and literature. The images are in .JPEG format.d BCP-47 code is `en`.
#### Who are the source data producers?
British Library, British Library Labs, Adrian Edwards (Curator), Neil Fitzgerald (Contributor ORCID)
### Annotations
The dataset does not contain any additional annotations.
#### Annotation process
[N/A]
#### Who are the annotators?
[N/A]
### Personal and Sensitive Information
[N/A]
## Considerations for Using the Data
### Social Impact of Dataset
[N/A]
### Discussion of Biases
[N/A]
### Other Known Limitations
This is a toy dataset that aims at:
- validating the process described in the tutorial [Using 🤗 datasets for image search](https://danielvanstrien.xyz/metadata/deployment/huggingface/ethics/huggingface-datasets/faiss/2022/01/13/image_search.html) by Daniel van Strien,
- showing the [dataset viewer](https://huggingface.co/datasets/severo/embellishments/viewer/severo--embellishments/train) on an image dataset.
## Additional Information
### Dataset Curators
The dataset was created by Sylvain Lesage at Hugging Face, to replicate the tutorial [Using 🤗 datasets for image search](https://danielvanstrien.xyz/metadata/deployment/huggingface/ethics/huggingface-datasets/faiss/2022/01/13/image_search.html) by Daniel van Strien.
### Licensing Information
CC0 1.0 Universal Public Domain
提供机构:
severo
原始信息汇总
数据集概述
数据集名称
- 名称: Digitised Books - Images identified as Embellishments. c. 1510 - c. 1900. JPG
- 别名: severo/embellishments
数据集描述
数据集摘要
- 内容: 包含100个图像缩略图,这些图像来自1510年至1900年间出版的书籍,被标识为装饰性图像。
- 用途: 用于复制Daniel van Strien的教程Using 🤗 datasets for image search。
数据集结构
数据实例
-
组成: 每个实例包含图像缩略图、文件名、以及图像来源书籍的出版年份。
-
示例:
{ fname: 000811462_05_000205_1_The Pictorial History of England being a history of the people as well as a hi_1855.jpg, year: 1855, path: embellishments/1855/000811462_05_000205_1_The Pictorial History of England being a history of the people as well as a hi_1855.jpg, img: ... }
数据字段
fname: 图像文件名。year: 字符串,表示图像来源书籍的出版年份。path: 图像的本地路径。img: 图像缩略图,最大高度和宽度为224像素。
数据分割
- 分割方式: 单一的train分割,共100行。
数据集创建
源数据
- 来源: 从49,455本数字化书籍中算法收集,涵盖1510年至1900年间出版的书籍。
- 格式: JPEG格式。
- 数据生产者: 英国图书馆、英国图书馆实验室、Adrian Edwards (策展人)、Neil Fitzgerald (贡献者ORCID)。
数据集创建者
- 创建者: Sylvain Lesage at Hugging Face
许可证
- 许可证: CC0 1.0 Universal Public Domain



