undefined443/cc12m-wds-recaption

Name: undefined443/cc12m-wds-recaption
Creator: undefined443
Published: 2026-03-31 08:53:13
License: 暂无描述

Hugging Face2026-03-31 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/undefined443/cc12m-wds-recaption

下载链接

链接失效反馈

官方服务：

资源简介：

--- title: CC12M with Enhanced Captions license: other license_name: cc12m license_link: https://github.com/google-research-datasets/conceptual-12m/blob/main/LICENSE language: - en tags: - image-text - captions - multimodal - vision-language - qwen-vl - recaption task_categories: - image-to-text - text-to-image task_ids: - image-captioning pretty_name: CC12M Enhanced Captions size_categories: - 1M<n<10M configs: - config_name: default data_files: - split: train path: data.parquet --- # CC12M with Enhanced Captions This dataset contains 1.3 million image-text pairs from the CC12M dataset with model-generated captions. ## Dataset Details - **Total Samples**: 1,306,239 - **Source**: [pixparse/cc12m-wds](https://huggingface.co/datasets/pixparse/cc12m-wds) - **Captioning Model**: Qwen/Qwen3-VL-8B-Instruct - **Format**: Parquet ## Filtering Criteria Samples were filtered based on the following quality metrics: - **Aesthetic Score**: >= 5.5 (using LAION aesthetic classifier) - **Resolution**: >= 512 pixels (width or height) - **Aspect Ratio**: <= 2.0 ## Dataset Schema | Column | Type | Description | |--------|------|-------------| | `key` | string | Original sample identifier | | `width` | int32 | Image width in pixels | | `height` | int32 | Image height in pixels | | `aesthetic_score` | float32 | LAION aesthetic quality score | | `caption` | string | Model-generated image description | ## Usage ```python import pandas as pd from datasets import Dataset # Load from parquet df = pd.read_parquet('train.parquet') print(df.head()) # Or use with HuggingFace datasets library from datasets import load_dataset dataset = load_dataset('undefined443/cc12m-wds-recaption') ``` ## Citation If you use this dataset, please cite the original CC12M paper: ```bibtex @article{changpinyo2021cc12m, title={Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts}, author={Changpinyo, Soravit and Sharma, Ashwin and Chai, Yinxiao and Cheng, Li and Cottore, Gustavo and Jiang, Nanfei and Jin, Han and Kembhavi, Aniruddha and Krishna, Ranjay and Najdenkoska, Ivona and Parisi, German and others}, journal={arXiv preprint arXiv:2102.08981}, year={2021} } ``` ## License This dataset inherits the license from the original CC12M dataset. Please refer to the [CC12M license terms](https://github.com/google-research-datasets/conceptual-12m/blob/main/LICENSE) for usage restrictions.

提供机构：

undefined443

5,000+

优质数据集

54 个

任务类型

进入经典数据集