five

Alexator26/DataComp-12M-Images-256

收藏
Hugging Face2025-12-05 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/Alexator26/DataComp-12M-Images-256
下载链接
链接失效反馈
官方服务:
资源简介:
--- task_categories: - image-to-text - text-to-image language: - ru - en size_categories: - 1M<n<10M --- # DataComp-12M Translated ## Description Translated [DataComp-12M](https://huggingface.co/datasets/mlfoundations/DataComp-12M). English captions were machine-translated to Russian using [GigaChat3-10B-A1.8B](https://huggingface.co/ai-sage/GigaChat3-10B-A1.8B). ## Statistics | Metric | Count | |--------|-------| | Original dataset | 12,561,027 | | Successfully downloaded | 8,744,177 (69.6%) | | Failed to download | 3,497,631 (27.8%) | | Failed to resize | 319,219 (2.5%) | ### Full Dataset - **`*.parquet`** - Full metadata (1,257 files, 5 corrupted) - **`*.tar`** - Images resized to 256px ### Corrupted Parquet Files The following 5 parquet files are corrupted and should be skipped: - `00122.parquet` - `00184.parquet` - `00322.parquet` - `00934.parquet` - `00956.parquet` ### Parquet files | Column | Description | |--------|-------------| | `text_en` | Original English caption | | `caption` | Russian translation | | `url` | Image URL | | `key` | Unique identifier | | `status` | Download status (`success` / `failed_to_download` / `failed_to_resize`) | | `width` / `height` | Image dimensions | | `original_width` / `original_height` | Original image dimensions | | `exif` | EXIF metadata (JSON) | | `sha256` | Image hash |
提供机构:
Alexator26
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作