Alexator26/DataComp-12M-Images-256
收藏Hugging Face2025-12-05 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/Alexator26/DataComp-12M-Images-256
下载链接
链接失效反馈官方服务:
资源简介:
---
task_categories:
- image-to-text
- text-to-image
language:
- ru
- en
size_categories:
- 1M<n<10M
---
# DataComp-12M Translated
## Description
Translated [DataComp-12M](https://huggingface.co/datasets/mlfoundations/DataComp-12M). English captions were machine-translated to Russian using [GigaChat3-10B-A1.8B](https://huggingface.co/ai-sage/GigaChat3-10B-A1.8B).
## Statistics
| Metric | Count |
|--------|-------|
| Original dataset | 12,561,027 |
| Successfully downloaded | 8,744,177 (69.6%) |
| Failed to download | 3,497,631 (27.8%) |
| Failed to resize | 319,219 (2.5%) |
### Full Dataset
- **`*.parquet`** - Full metadata (1,257 files, 5 corrupted)
- **`*.tar`** - Images resized to 256px
### Corrupted Parquet Files
The following 5 parquet files are corrupted and should be skipped:
- `00122.parquet`
- `00184.parquet`
- `00322.parquet`
- `00934.parquet`
- `00956.parquet`
### Parquet files
| Column | Description |
|--------|-------------|
| `text_en` | Original English caption |
| `caption` | Russian translation |
| `url` | Image URL |
| `key` | Unique identifier |
| `status` | Download status (`success` / `failed_to_download` / `failed_to_resize`) |
| `width` / `height` | Image dimensions |
| `original_width` / `original_height` | Original image dimensions |
| `exif` | EXIF metadata (JSON) |
| `sha256` | Image hash |
提供机构:
Alexator26



