five

small-models-for-glam/synthetic-aat-materials

收藏
Hugging Face2025-09-17 更新2026-02-07 收录
下载链接:
https://hf-mirror.com/datasets/small-models-for-glam/synthetic-aat-materials
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: mit task_categories: - text-generation - conversational tags: - cultural-heritage - aat - materials - synthetic - qwen3 pretty_name: "Synthetic AAT Materials Dataset" size_categories: - 1K<n<10K --- # Synthetic AAT Materials Dataset ## Dataset Description This dataset contains 1000 synthetic examples of cultural heritage object descriptions paired with their materials as they would appear in the Getty Art & Architecture Thesaurus (AAT). The data is formatted for training conversational AI models, particularly Qwen3, to identify and extract materials from cultural heritage object descriptions. ## Dataset Structure Each example contains: - `messages`: Conversation format with user question and assistant response - `object_description`: The original cultural heritage object description - `materials_identified`: The extracted materials in AAT format ## Example ```json { "messages": [ { "role": "user", "content": "Given this cultural heritage object description:\n\nA bronze sculpture from 1425, standing 150 cm tall...\n\nIdentify the materials separate out materials as they would be found in Getty AAT" }, { "role": "assistant", "content": "[{\"Bronze\": [\"bronze\"]}]" } ], "object_description": "A bronze sculpture from 1425, standing 150 cm tall...", "materials_identified": [{"Bronze": ["bronze"]}] } ``` ## Object Types The dataset includes diverse cultural heritage objects: - Paintings (oil on canvas, tempera on wood, etc.) - Sculptures (bronze, marble, clay, wood) - Ceramics and pottery - Textiles and fabrics - Metalwork and jewelry - Glassware - Manuscripts and prints - Furniture and decorative objects ## Use Cases - Training models to identify materials in cultural heritage descriptions - Fine-tuning conversational AI for museum and archive applications - Developing tools for cataloging cultural artifacts - Research in digital humanities and cultural heritage informatics ## Dataset Creation Generated using a synthetic data pipeline that: 1. Creates realistic object descriptions with randomized attributes 2. Uses LLMs to extract materials in AAT format 3. Applies expert knowledge mappings for material terminology 4. Ensures diversity across object types, time periods, and materials ## Licensing This dataset is released under the MIT License. While the data is synthetic, it's designed to reflect real-world cultural heritage cataloging practices.
提供机构:
small-models-for-glam
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作