five

Shuaimyself/MMArt

收藏
Hugging Face2026-04-10 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/Shuaimyself/MMArt
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-4.0 task_categories: - image-to-text language: - en tags: - art - multimodal - WikiArt - captioning - retrieval pretty_name: MMArt size_categories: - 10K<n<100K dataset_info: features: - name: image_id dtype: string - name: title dtype: string - name: artist dtype: string - name: style dtype: string - name: date dtype: float32 - name: e_narrative dtype: string - name: e_formal dtype: string - name: e_emotional dtype: string - name: e_historical dtype: string - name: dominant_emotion dtype: string - name: artemis_coverage dtype: bool - name: rag_sim dtype: float32 - name: n_perspectives dtype: int32 - name: e_unified dtype: string - name: image dtype: image splits: - name: train num_bytes: 10736489220 num_examples: 74234 download_size: 25297727696 dataset_size: 10736489220 configs: - config_name: default data_files: - split: train path: data/train-* --- # MMArt: A Multi-Perspective Multimodal Dataset for Visual Art Understanding **MMArt** is a large-scale dataset of **74,234 WikiArt paintings**, each annotated with four independently generated interpretive perspectives — Narrative, Formal, Emotional, Historical — plus a harmonized unified caption. > Paper under review at **ACM Multimedia 2026** (Dataset Track) > Supplementary website: https://ShuaiWang97.github.io/MMArt > Code: https://github.com/ShuaiWang97/MMArt --- ## Dataset Summary Each painting is annotated by specialist models from four distinct interpretive angles: | Field | Perspective | Model | |---|---|---| | `e_narrative` | Narrative & Scene | Qwen3-VL-8B-Instruct | | `e_formal` | Formal Analysis | GalleryGPT (LLaVA-7B + LoRA) | | `e_emotional` | Emotional Response | Qwen3-VL-8B-Instruct + ARTEMIS-v2 | | `e_historical` | Historical Context | RAG with Art Context knowledge | | `e_unified` | Unified Caption | Qwen3-8B (vLLM) | --- ## Dataset Statistics | Metric | Value | |---|---| | Total paintings | 74,234 | | Art styles | 20 | | Artists | 743 | | Text fields per painting | 5 | | Average caption length | ~70–80 words per perspective | | ARTEMIS-v2 emotional grounding | 99.0% of paintings | --- ## Data Fields | Field | Type | Description | |---|---|---| | `image_id` | string | WikiArt relative path — unique key (e.g. `Romanticism/delacroix_liberty-leading-the-people.jpg`) | | `title` | string | Painting title | | `artist` | string | Artist name | | `style` | string | WikiArt style category (20 classes) | | `date` | string | Creation date or period | | `e_narrative` | string | Narrative & scene interpretation (~80 words) | | `e_formal` | string | Formal visual analysis — composition, brushwork, palette (~80 words) | | `e_emotional` | string | Emotional response and atmosphere (~80 words) | | `e_historical` | string | Art-historical context and cultural meaning (~80 words) | | `e_unified` | string | Unified caption integrating all four perspectives (~150 words) | | `dominant_emotion` | string | Majority-vote emotion from ARTEMIS-v2 (9 categories) | | `artemis_coverage` | bool | True if ARTEMIS-v2 utterances were available for grounding | | `rag_sim` | float | Cosine similarity of best RAG retrieval hit for historical context | | `n_perspectives` | int | Count of non-null perspectives (all 4 in this split) | **Art styles:** abstract_expressionism, art_nouveau_modern, baroque, color_field_painting, cubism, early_renaissance, expressionism, fauvism, high_renaissance, impressionism, mannerism_late_renaissance, minimalism, naive_art_primitivism, northern_renaissance, pop_art, post_impressionism, realism, rococo, romanticism, ukiyo_e **Dominant emotions:** amusement, anger, awe, contentment, disgust, excitement, fear, sadness, something else --- ## Usage ```python from datasets import load_dataset dataset = load_dataset("Shuaimyself/MMArt") print(dataset['train'][0]) ``` **Note:** This dataset contains text annotations only. The original WikiArt images are not redistributed due to copyright. Images can be accessed via [WikiArt.org](https://www.wikiart.org) using the `image_id` field as the relative path. --- ## Data Collection Perspectives were generated using a multi-model pipeline on Snellius HPC (SURF): - **Narrative & Emotional:** Qwen3-VL-8B-Instruct via vLLM, conditioned on the painting image and metadata. Emotional perspective additionally grounded with crowd-sourced reactions from [ARTEMIS-v2](https://www.artemisdataset-v2.org/). - **Formal:** GalleryGPT (LLaVA-7B fine-tuned on formal art analysis). - **Historical:** Qwen3-VL-8B-Instruct augmented with Wikipedia art-history passages retrieved via `sentence-transformers/all-MiniLM-L6-v2` (cosine similarity threshold 0.25, top-3 chunks). - **Unified:** Qwen3-8B (text-only, vLLM) synthesizing all four perspectives into a coherent ~150-word description. --- ## License This dataset is released under [CC BY 4.0](https://creativecommons.org/licenses/by/4.0/). Text annotations are original work by the authors. Painting images are © their respective rights holders and are not included. --- ## Citation ```bibtex @inproceedings{wang2026mmart, title = {MMArt: A Multi-Perspective Multimodal Dataset for Visual Art Understanding}, author = {Wang, Shuai and Ding, Wangyuan and Shen, Yixian and Huang, Jia-Hong and Rudinac, Stevan and Kackovic, Monika and Wijnberg, Nachoem and Worring, Marcel}, year = {2026}, } ```
提供机构:
Shuaimyself
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作