five

TomAcolab/unsplash-lite

收藏
Hugging Face2026-04-16 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/TomAcolab/unsplash-lite
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: default features: - name: image dtype: image - name: keywords dtype: string splits: - name: train num_bytes: 2045209850.972 num_examples: 24996 download_size: 1935601893 dataset_size: 2045209850.972 - config_name: embeddings_clip-ViT-B-32 features: - name: embeddings_clip-ViT-B-32 sequence: float32 splits: - name: train num_bytes: 51291792 num_examples: 24996 download_size: 66797740 dataset_size: 51291792 - config_name: embeddings_metaclip-2-worldwide-s16-384 features: - name: embeddings_metaclip-2-worldwide-s16-384 sequence: float32 splits: - name: train num_bytes: 38493840 num_examples: 24996 download_size: 53998486 dataset_size: 38493840 - config_name: embeddings_metaclip-2-worldwide-s16-384-eng-32768 features: - name: embeddings_metaclip-2-worldwide-s16-384-eng-32768 sequence: float32 splits: - name: train num_bytes: 38493840 num_examples: 24996 download_size: 53998780 dataset_size: 38493840 configs: - config_name: default data_files: - split: train path: data/train-* - config_name: embeddings_clip-ViT-B-32 data_files: - split: train path: embeddings_clip-ViT-B-32/train-* - config_name: embeddings_metaclip-2-worldwide-s16-384 data_files: - split: train path: embeddings_metaclip-2-worldwide-s16-384/train-* - config_name: embeddings_metaclip-2-worldwide-s16-384-eng-32768 data_files: - split: train path: embeddings_metaclip-2-worldwide-s16-384-eng-32768/train-* --- # Unsplash Lite [Unsplash Lite Dataset](https://unsplash.com/data). This dataset contains: - a subset `default` with about 25k images and related keywords when available. Keywords are eparated by `;` and note that we kept only those where the confident score indicated by Unsplash is higher than 90% - a subset `embeddings_clip-ViT-B-32` which contains precomputed embeddings of the images via the [clip-ViT-B-32](https://huggingface.co/sentence-transformers/clip-ViT-B-32) model by OpenAI - a subset `embeddings_metaclip-2-worldwide-s16-384` which contains precomputed embeddings of the images via the [metaclip-2-worldwide-s16-384](https://huggingface.co/facebook/metaclip-2-worldwide-s16-384) model by Meta - a subset `embeddings_metaclip-2-worldwide-s16-384-eng-32768` which contains precomputed embeddings of the images via the [metaclip-2-worldwide-s16-384-eng-32768](https://huggingface.co/alphaedge-ai/metaclip-2-worldwide-s16-384-eng-32768) model by AlphaEdge These three precomputed embeddings subsets are useful for tutorial notebooks in the [Sentence Transformers documentation](https://sbert.net/examples/sentence_transformer/applications/image-search/README.html) showing how to use the lib to perform [(multi-lingual) 0-shot image classification](https://github.com/huggingface/sentence-transformers/blob/main/examples/sentence_transformer/applications/image-search/Image_Classification.ipynb), [monolingual](https://github.com/huggingface/sentence-transformers/blob/main/examples/sentence_transformer/applications/image-search/Image_Search.ipynb)/[multilingual](https://github.com/huggingface/sentence-transformers/blob/main/examples/sentence_transformer/applications/image-search/image_search_multilingual.ipynb), [image clustering](https://github.com/huggingface/sentence-transformers/blob/main/examples/sentence_transformer/applications/image-search/Image_Clustering.ipynb) and [image de-duplication](https://github.com/huggingface/sentence-transformers/blob/main/examples/sentence_transformer/applications/image-search/Image_Duplicates.ipynb). # License https://unsplash.com/license TL;DR: Unsplash images are designed to be used freely, and our license reflects this. - It is possible to download and use all the images for free - For commercial and non-commercial purposes - No permission required (an assignment will always be appreciated!) What is not allowed 👎 - It is not allowed to sell the images without significant modification. - Compiling images from Unsplash to replicate a similar or competing service. Credit to [lbourdois](https://huggingface.co/lbourdois) for collecting and uploading this dataset.
提供机构:
TomAcolab
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作