kaupane/relaion-art-recap-zh
收藏Hugging Face2025-12-11 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/kaupane/relaion-art-recap-zh
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- zh
- en
size_categories:
- 1M<n<10M
task_categories:
- text-to-image
- image-to-text
pretty_name: Relaion-Art Recaptioned (Chinese)
tags:
- art
- image-captioning
- qwen3-vl
- multimodal
---
# Dataset Card for Relaion-Art Recaptioned (Chinese)
This dataset is a recaptioned version of [laion/relaion-art](https://huggingface.co/datasets/laion/relaion-art), featuring high-quality Chinese captions generated using Alibaba's Qwen3-VL-Flash model via the DashScope Batch API.
## Dataset Description
- **Curated by:** kaupane
- **Language(s):** Chinese (captions), Multilingual (metadata)
- **License:** Inherite from source dataset
- **Source Dataset:** [laion/relaion-art](https://huggingface.co/datasets/laion/relaion-art)
- **Captioning Model:** Qwen3-VL-Flash (via Alibaba DashScope)
This dataset contains **2,007,213** image-caption pairs derived from the original Relaion-Art dataset (~8M samples). Each image has been recaptioned with detailed, accurate Chinese descriptions that highlight the subject, scene, style, and key visual details. The dataset is designed for training text-to-image models, vision-language models, and other multimodal applications requiring high-quality Chinese text-image pairs.
The significant reduction in dataset size (from ~8M to ~2M) is primarily due to:
- **Multimodal content download failures** at DashScope's server side
- **Safety filtering** applied by DashScope
- **Link rot** (broken or inaccessible image URLs)
- **Quality filtering** based on resolution (minimum 160,000 pixels) and watermark probability (< 0.6)
All filtering except resolution and watermark-based filtering occurred server-side at DashScope during the batch inference process.
## Uses
This dataset is suitable for:
- Training Chinese text-to-image diffusion models
- Fine-tuning vision-language models for Chinese image understanding
- Cross-lingual multimodal research
- Art generation and style transfer applications
- Image captioning model evaluation
## Dataset Structure
Each record in the JSONL file contains the following fields:
- **`url`**: Image URL (string)
- **`qwen-caption`**: High-quality Chinese caption generated by Qwen3-VL-Flash (string)
- **`laion-text`**: Original LAION caption (string, may be in various languages)
- **`width`**: Image width in pixels (integer)
- **`height`**: Image height in pixels (integer)
- **`language`**: Detected language of original LAION caption (string)
- **`pwatermark`**: Watermark probability score (float, 0.0-1.0)
- **`punsafe`**: NSFW probability score (float, 0.0-1.0)
- Additional metadata fields from the original dataset
### Example Record
```json
{
"url": "https://example.com/artwork.jpg",
"qwen-caption": "一幅色彩鲜艳的抽象画,画面中央是一个巨大的红色圆形,周围环绕着蓝色和黄色的几何图案,整体风格现代且充满活力。",
"laion-text": "abstract art painting",
"width": 1024,
"height": 768,
"language": "en",
"pwatermark": 0.23,
"punsafe": 0.01,
"similarity": 0.5,
"hash": 9,063,228,627,116,585,000,
"aesthetic": 8.5
}
```
## Dataset Creation
### Curation Rationale
The original Relaion-Art dataset contains artistic images but often lacks detailed, accurate captions in Chinese. This recaptioning effort aims to:
1. Provide high-quality Chinese descriptions for art-focused image generation
2. Enable better Chinese text-to-image model training
3. Improve cross-lingual multimodal understanding
### Source Data
#### Data Collection and Processing
**Source Dataset:** laion/relaion-art (train split)
**Filtering Criteria:**
- Minimum resolution: 160,000 pixels (width × height ≥ 160,000)
- Maximum watermark probability: 0.6 (pwatermark < 0.6)
- Valid image URL required
- Server-side filtering by DashScope (safety, accessibility, content validity)
**Captioning Process:**
- **Model:** Qwen3-VL-Flash via Alibaba DashScope Batch API
- **Prompt:**
```
请为这张图像生成一条高质量的中文描述,描述应准确、具体,突出画面中的主体、场景、风格和关键细节,但不宜过长,也不要加入与画面无关的推测或主观评价。不要任何开场白或解释,直接开始描述图片内容。
```
(Translation: "Please generate a high-quality Chinese description for this image. The description should be accurate and specific, highlighting the subject, scene, style, and key details in the image, but should not be too long, and should not include speculation or subjective comments unrelated to the image. Do not include any preamble or explanation, start describing the image content directly.")
- **Image Resolution:** max_pixels=65,536 (to balance quality and cost)
- **Total Cost:** 163.52 CNY (~$22.50 USD)
- **Average Token Length:** 88.8
All server-side filtering occurred automatically during DashScope's batch inference process.
#### Who are the source data producers?
The source images are from the Relaion-Art dataset, which aggregates art images from various web sources. The Chinese captions were generated by Qwen3-VL-Flash, a state-of-the-art vision-language model developed by Alibaba Cloud.
## Bias, Risks, and Limitations
**Potential Biases:**
- Captions reflect the perspective and potential biases of the Qwen3-VL-Flash model
- Source dataset may have geographic or cultural biases in art representation
- Safety filtering may have removed certain artistic content
**Limitations:**
- Captions are in Chinese only; users requiring other languages should use the original LAION captions or re-caption
- Image URLs may become invalid over time (link rot)
- Dataset size is significantly smaller than the original due to filtering
- Some artistic nuances may be lost in automated captioning
**Risks:**
- Image URLs point to external sources and may change or disappear
- Despite filtering, some inappropriate content may remain (check `punsafe` scores)
- Watermarked images are still included (filtered at pwatermark < 0.6)
提供机构:
kaupane



