TomAcolab/unsplash-lite
收藏Hugging Face2026-04-16 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/TomAcolab/unsplash-lite
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
- config_name: default
features:
- name: image
dtype: image
- name: keywords
dtype: string
splits:
- name: train
num_bytes: 2045209850.972
num_examples: 24996
download_size: 1935601893
dataset_size: 2045209850.972
- config_name: embeddings_clip-ViT-B-32
features:
- name: embeddings_clip-ViT-B-32
sequence: float32
splits:
- name: train
num_bytes: 51291792
num_examples: 24996
download_size: 66797740
dataset_size: 51291792
- config_name: embeddings_metaclip-2-worldwide-s16-384
features:
- name: embeddings_metaclip-2-worldwide-s16-384
sequence: float32
splits:
- name: train
num_bytes: 38493840
num_examples: 24996
download_size: 53998486
dataset_size: 38493840
- config_name: embeddings_metaclip-2-worldwide-s16-384-eng-32768
features:
- name: embeddings_metaclip-2-worldwide-s16-384-eng-32768
sequence: float32
splits:
- name: train
num_bytes: 38493840
num_examples: 24996
download_size: 53998780
dataset_size: 38493840
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- config_name: embeddings_clip-ViT-B-32
data_files:
- split: train
path: embeddings_clip-ViT-B-32/train-*
- config_name: embeddings_metaclip-2-worldwide-s16-384
data_files:
- split: train
path: embeddings_metaclip-2-worldwide-s16-384/train-*
- config_name: embeddings_metaclip-2-worldwide-s16-384-eng-32768
data_files:
- split: train
path: embeddings_metaclip-2-worldwide-s16-384-eng-32768/train-*
---
# Unsplash Lite
[Unsplash Lite Dataset](https://unsplash.com/data).
This dataset contains:
- a subset `default` with about 25k images and related keywords when available. Keywords are eparated by `;` and note that we kept only those where the confident score indicated by Unsplash is higher than 90%
- a subset `embeddings_clip-ViT-B-32` which contains precomputed embeddings of the images via the [clip-ViT-B-32](https://huggingface.co/sentence-transformers/clip-ViT-B-32) model by OpenAI
- a subset `embeddings_metaclip-2-worldwide-s16-384` which contains precomputed embeddings of the images via the [metaclip-2-worldwide-s16-384](https://huggingface.co/facebook/metaclip-2-worldwide-s16-384) model by Meta
- a subset `embeddings_metaclip-2-worldwide-s16-384-eng-32768` which contains precomputed embeddings of the images via the [metaclip-2-worldwide-s16-384-eng-32768](https://huggingface.co/alphaedge-ai/metaclip-2-worldwide-s16-384-eng-32768) model by AlphaEdge
These three precomputed embeddings subsets are useful for tutorial notebooks in the [Sentence Transformers documentation](https://sbert.net/examples/sentence_transformer/applications/image-search/README.html) showing how to use the lib to perform [(multi-lingual) 0-shot image classification](https://github.com/huggingface/sentence-transformers/blob/main/examples/sentence_transformer/applications/image-search/Image_Classification.ipynb), [monolingual](https://github.com/huggingface/sentence-transformers/blob/main/examples/sentence_transformer/applications/image-search/Image_Search.ipynb)/[multilingual](https://github.com/huggingface/sentence-transformers/blob/main/examples/sentence_transformer/applications/image-search/image_search_multilingual.ipynb), [image clustering](https://github.com/huggingface/sentence-transformers/blob/main/examples/sentence_transformer/applications/image-search/Image_Clustering.ipynb) and [image de-duplication](https://github.com/huggingface/sentence-transformers/blob/main/examples/sentence_transformer/applications/image-search/Image_Duplicates.ipynb).
# License
https://unsplash.com/license
TL;DR:
Unsplash images are designed to be used freely, and our license reflects this.
- It is possible to download and use all the images for free
- For commercial and non-commercial purposes
- No permission required (an assignment will always be appreciated!)
What is not allowed 👎
- It is not allowed to sell the images without significant modification.
- Compiling images from Unsplash to replicate a similar or competing service.
Credit to [lbourdois](https://huggingface.co/lbourdois) for collecting and uploading this dataset.
提供机构:
TomAcolab



