five

deepghs/csip_v1

收藏
Hugging Face2025-11-17 更新2024-06-22 收录
下载链接:
https://hf-mirror.com/datasets/deepghs/csip_v1
下载链接
链接失效反馈
官方服务:
资源简介:
--- task_categories: - zero-shot-image-classification tags: - art - anime - style-classification size_categories: - 100K<n<1M language: - multilingual license: - cc-by-4.0 source_datasets: - original --- # CSIP v1: Contrastive Anime Style Image Pre-training Dataset ## Summary The **CSIP v1** dataset represents a **roughly cleaned** version of the Contrastive anime Style Image Pre-training collection, specifically designed for **zero-shot image classification** tasks in the anime art domain. This comprehensive dataset contains diverse images from various **anime artists**, organized to facilitate style recognition and classification models. The dataset has been processed through initial cleaning procedures to remove obvious duplicates and low-quality samples while preserving the rich stylistic diversity that characterizes different anime creators. This dataset serves as an intermediate step between the raw, unprocessed collection and the meticulously curated evaluation set, offering researchers and developers a **balanced compromise** between data volume and quality. The images are distributed across multiple zip archives (p0-p8) for efficient downloading and processing, making it suitable for large-scale pre-training applications where both quantity and reasonable quality are essential considerations. The **CSIP v1** dataset is particularly valuable for **contrastive learning** approaches, where the stylistic differences between artists can be leveraged to train models that understand and recognize artistic signatures. With its size category of 100K to 1M samples, this dataset provides sufficient scale for training robust vision models while maintaining enough quality control to ensure meaningful learning outcomes. The dataset's organization supports various computer vision tasks beyond zero-shot classification, including style transfer, artist identification, and content-based image retrieval in the anime domain. ## Dataset Structure The dataset is split into 9 zip archives for convenient downloading and processing: - `csip_v1_p0.zip` to `csip_v1_p8.zip` Each archive contains a portion of the cleaned anime style images organized by artist categories. ## Related Datasets This repository is part of the CSIP dataset series: - **Raw version**: [deepghs/csip](https://huggingface.co/datasets/deepghs/csip) - The original unprocessed collection - **Cleaned version**: [deepghs/csip_v1](https://huggingface.co/datasets/deepghs/csip_v1) - This roughly cleaned version - **Evaluation version**: [deepghs/csip_eval](https://huggingface.co/datasets/deepghs/csip_eval) - Human-picked subset for evaluation ## Usage The dataset can be downloaded and used for various computer vision tasks, particularly for anime style classification and recognition. The zip archives can be extracted to access the image files organized by artist styles. ## Citation ```bibtex @misc{csip_v1, title = {CSIP v1: Contrastive Anime Style Image Pre-training Dataset}, author = {deepghs}, howpublished = {\url{https://huggingface.co/datasets/deepghs/csip_v1}}, year = {2023}, note = {Roughly cleaned version of anime style images for zero-shot classification}, abstract = {The CSIP v1 dataset represents a roughly cleaned version of the Contrastive anime Style Image Pre-training collection, specifically designed for zero-shot image classification tasks in the anime art domain. This comprehensive dataset contains diverse images from various anime artists, organized to facilitate style recognition and classification models. The dataset has been processed through initial cleaning procedures to remove obvious duplicates and low-quality samples while preserving the rich stylistic diversity that characterizes different anime creators.}, keywords = {anime, style-classification, zero-shot-learning, computer-vision} } ```
提供机构:
deepghs
原始信息汇总

数据集概述

任务类别

  • zero-shot-image-classification

标签

  • art

数据集大小

  • 100K<n<1M
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作