Shuaimyself/MMArt
收藏Hugging Face2026-04-10 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Shuaimyself/MMArt
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- image-to-text
language:
- en
tags:
- art
- multimodal
- WikiArt
- captioning
- retrieval
pretty_name: MMArt
size_categories:
- 10K<n<100K
dataset_info:
features:
- name: image_id
dtype: string
- name: title
dtype: string
- name: artist
dtype: string
- name: style
dtype: string
- name: date
dtype: float32
- name: e_narrative
dtype: string
- name: e_formal
dtype: string
- name: e_emotional
dtype: string
- name: e_historical
dtype: string
- name: dominant_emotion
dtype: string
- name: artemis_coverage
dtype: bool
- name: rag_sim
dtype: float32
- name: n_perspectives
dtype: int32
- name: e_unified
dtype: string
- name: image
dtype: image
splits:
- name: train
num_bytes: 10736489220
num_examples: 74234
download_size: 25297727696
dataset_size: 10736489220
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
# MMArt: A Multi-Perspective Multimodal Dataset for Visual Art Understanding
**MMArt** is a large-scale dataset of **74,234 WikiArt paintings**, each annotated with four independently generated interpretive perspectives — Narrative, Formal, Emotional, Historical — plus a harmonized unified caption.
> Paper under review at **ACM Multimedia 2026** (Dataset Track)
> Supplementary website: https://ShuaiWang97.github.io/MMArt
> Code: https://github.com/ShuaiWang97/MMArt
---
## Dataset Summary
Each painting is annotated by specialist models from four distinct interpretive angles:
| Field | Perspective | Model |
|---|---|---|
| `e_narrative` | Narrative & Scene | Qwen3-VL-8B-Instruct |
| `e_formal` | Formal Analysis | GalleryGPT (LLaVA-7B + LoRA) |
| `e_emotional` | Emotional Response | Qwen3-VL-8B-Instruct + ARTEMIS-v2 |
| `e_historical` | Historical Context | RAG with Art Context knowledge |
| `e_unified` | Unified Caption | Qwen3-8B (vLLM) |
---
## Dataset Statistics
| Metric | Value |
|---|---|
| Total paintings | 74,234 |
| Art styles | 20 |
| Artists | 743 |
| Text fields per painting | 5 |
| Average caption length | ~70–80 words per perspective |
| ARTEMIS-v2 emotional grounding | 99.0% of paintings |
---
## Data Fields
| Field | Type | Description |
|---|---|---|
| `image_id` | string | WikiArt relative path — unique key (e.g. `Romanticism/delacroix_liberty-leading-the-people.jpg`) |
| `title` | string | Painting title |
| `artist` | string | Artist name |
| `style` | string | WikiArt style category (20 classes) |
| `date` | string | Creation date or period |
| `e_narrative` | string | Narrative & scene interpretation (~80 words) |
| `e_formal` | string | Formal visual analysis — composition, brushwork, palette (~80 words) |
| `e_emotional` | string | Emotional response and atmosphere (~80 words) |
| `e_historical` | string | Art-historical context and cultural meaning (~80 words) |
| `e_unified` | string | Unified caption integrating all four perspectives (~150 words) |
| `dominant_emotion` | string | Majority-vote emotion from ARTEMIS-v2 (9 categories) |
| `artemis_coverage` | bool | True if ARTEMIS-v2 utterances were available for grounding |
| `rag_sim` | float | Cosine similarity of best RAG retrieval hit for historical context |
| `n_perspectives` | int | Count of non-null perspectives (all 4 in this split) |
**Art styles:** abstract_expressionism, art_nouveau_modern, baroque, color_field_painting, cubism, early_renaissance, expressionism, fauvism, high_renaissance, impressionism, mannerism_late_renaissance, minimalism, naive_art_primitivism, northern_renaissance, pop_art, post_impressionism, realism, rococo, romanticism, ukiyo_e
**Dominant emotions:** amusement, anger, awe, contentment, disgust, excitement, fear, sadness, something else
---
## Usage
```python
from datasets import load_dataset
dataset = load_dataset("Shuaimyself/MMArt")
print(dataset['train'][0])
```
**Note:** This dataset contains text annotations only. The original WikiArt images are not redistributed due to copyright. Images can be accessed via [WikiArt.org](https://www.wikiart.org) using the `image_id` field as the relative path.
---
## Data Collection
Perspectives were generated using a multi-model pipeline on Snellius HPC (SURF):
- **Narrative & Emotional:** Qwen3-VL-8B-Instruct via vLLM, conditioned on the painting image and metadata. Emotional perspective additionally grounded with crowd-sourced reactions from [ARTEMIS-v2](https://www.artemisdataset-v2.org/).
- **Formal:** GalleryGPT (LLaVA-7B fine-tuned on formal art analysis).
- **Historical:** Qwen3-VL-8B-Instruct augmented with Wikipedia art-history passages retrieved via `sentence-transformers/all-MiniLM-L6-v2` (cosine similarity threshold 0.25, top-3 chunks).
- **Unified:** Qwen3-8B (text-only, vLLM) synthesizing all four perspectives into a coherent ~150-word description.
---
## License
This dataset is released under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/). Text annotations are original work by the authors. Painting images are © their respective rights holders and are not included.
---
## Citation
```bibtex
@inproceedings{wang2026mmart,
title = {MMArt: A Multi-Perspective Multimodal Dataset for Visual Art Understanding},
author = {Wang, Shuai and Ding, Wangyuan and Shen, Yixian and Huang, Jia-Hong
and Rudinac, Stevan and Kackovic, Monika and Wijnberg, Nachoem
and Worring, Marcel},
year = {2026},
}
```
提供机构:
Shuaimyself



