james-burton/OrientalMuseum_min5-white-mat
收藏Hugging Face2024-02-28 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/james-burton/OrientalMuseum_min5-white-mat
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: obj_num
dtype: string
- name: file
dtype: string
- name: image
dtype: image
- name: root
dtype: string
- name: description
dtype: string
- name: object_name
dtype: string
- name: other_name
dtype: string
- name: label
dtype:
class_label:
names:
'0': Animal Mummy
'1': Batik
'2': Buffalo Horn
'3': Chinese Red Rosewood
'4': Colour on Paper
'5': Flint/Chert
'6': Gouache on Paper
'7': Haematite/Red Ochre
'8': Human Bone
'9': Ink and Colour on Paper
'10': Ink and Colours on Silk
'11': Ink and Opaque Watercolour on Paper
'12': Ink on Paper
'13': Jade (Calcified)
'14': Japanese paper
'15': Microcline/Green Feldspar/Amazon-Stone
'16': Nile Mud
'17': Opaque Watercolour on Paper
'18': Opaque Watercolour or Gouache on Mica
'19': Pith
'20': Pith Paper
'21': Plant Product
'22': Resin/Plastic
'23': Rhinoceros Horn
'24': Smaragdite
'25': Steatite
'26': Steatite/Soap Stone
'27': Watercolour on Rice Paper
'28': acrylic
'29': agate
'30': alabaster
'31': aluminum
'32': amber
'33': amethyst
'34': antler
'35': artificial stone
'36': balsa
'37': bamboo
'38': basalt
'39': bone
'40': bowenite
'41': boxwood
'42': brass
'43': brocade
'44': bronze
'45': burnt jade
'46': canvas
'47': cardboard
'48': cards
'49': carnelian
'50': cast iron
'51': celadon
'52': cellulose acetate
'53': ceramic
'54': chalcedony
'55': cherry
'56': clay
'57': cloth
'58': coconut
'59': copper
'60': copper alloy
'61': coral
'62': cotton
'63': crystal
'64': diorite
'65': dolerite
'66': earthenware
'67': ebony
'68': emerald
'69': enamel
'70': faience
'71': felt
'72': flax
'73': flint
'74': gauze
'75': glass
'76': gold
'77': granite
'78': gray ware
'79': hardwood
'80': horn
'81': incense
'82': ink
'83': iron
'84': ivory
'85': jade
'86': jadeite
'87': jasper
'88': lacquer
'89': lapis lazuli
'90': lazurite
'91': lead
'92': lead alloy
'93': leather
'94': limestone
'95': linen
'96': malachite
'97': marble
'98': metal
'99': mineral
'100': mother of pearl
'101': muslin
'102': nephrite
'103': nylon
'104': obsidian
'105': organic material
'106': paint
'107': palm fiber
'108': palm leaf
'109': paper
'110': papier mâché
'111': papyrus
'112': pewter
'113': photographic paper
'114': pine
'115': plant fiber
'116': plaster
'117': plastic
'118': plate
'119': polyester
'120': polystyrene
'121': porcelain
'122': pottery
'123': quartzite
'124': rattan
'125': realgar
'126': reed
'127': rice paper
'128': rock
'129': rush
'130': sandstone
'131': satin
'132': schist
'133': seashell
'134': serpentine
'135': shell
'136': silk
'137': siltstone
'138': silver
'139': skull
'140': slate
'141': soapstone
'142': softwood
'143': stalagmites
'144': steel
'145': stone
'146': stoneware
'147': straw
'148': stucco
'149': sycamore
'150': synthetic fiber
'151': teak
'152': terracotta
'153': textiles
'154': tin
'155': tortoise shell
'156': tourmaline
'157': travertine
'158': tremolite
'159': turquoise
'160': velvet
'161': wood
'162': wool
'163': wrought iron
'164': zinc alloy
- name: production.period
dtype: string
- name: production.place
dtype: string
- name: new_root
dtype: string
splits:
- name: train
num_bytes: 741469047.96
num_examples: 23060
- name: validation
num_bytes: 168672680.74
num_examples: 5426
- name: test
num_bytes: 137567256.474
num_examples: 5426
download_size: 950282711
dataset_size: 1047708985.174
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: validation
path: data/validation-*
- split: test
path: data/test-*
---
This dataset is primarily used for material classification, containing images and related information of various materials, such as object number, file path, image, root directory, description, object name, other name, and label. The label feature details a wide range of material classifications, including Animal Mummy, Batik, Buffalo Horn, etc. The dataset is divided into train, validation, and test parts, each containing different numbers of samples and bytes. The download size and total size of the dataset are 950282711 bytes and 1047708985.174 bytes, respectively.
提供机构:
james-burton
原始信息汇总
数据集概述
特征信息
数据集包含以下特征:
- obj_num: 类型为字符串
- file: 类型为字符串
- image: 类型为图像
- root: 类型为字符串
- description: 类型为字符串
- object_name: 类型为字符串
- other_name: 类型为字符串
- label: 类型为分类标签,包含以下类别:
- 0: Animal Mummy
- 1: Batik
- 2: Buffalo Horn
- 3: Chinese Red Rosewood
- 4: Colour on Paper
- 5: Flint/Chert
- 6: Gouache on Paper
- 7: Haematite/Red Ochre
- 8: Human Bone
- 9: Ink and Colour on Paper
- 10: Ink and Colours on Silk
- 11: Ink and Opaque Watercolour on Paper
- 12: Ink on Paper
- 13: Jade (Calcified)
- 14: Japanese paper
- 15: Microcline/Green Feldspar/Amazon-Stone
- 16: Nile Mud
- 17: Opaque Watercolour on Paper
- 18: Opaque Watercolour or Gouache on Mica
- 19: Pith
- 20: Pith Paper
- 21: Plant Product
- 22: Resin/Plastic
- 23: Rhinoceros Horn
- 24: Smaragdite
- 25: Steatite
- 26: Steatite/Soap Stone
- 27: Watercolour on Rice Paper
- 28: acrylic
- 29: agate
- 30: alabaster
- 31: aluminum
- 32: amber
- 33: amethyst
- 34: antler
- 35: artificial stone
- 36: balsa
- 37: bamboo
- 38: basalt
- 39: bone
- 40: bowenite
- 41: boxwood
- 42: brass
- 43: brocade
- 44: bronze
- 45: burnt jade
- 46: canvas
- 47: cardboard
- 48: cards
- 49: carnelian
- 50: cast iron
- 51: celadon
- 52: cellulose acetate
- 53: ceramic
- 54: chalcedony
- 55: cherry
- 56: clay
- 57: cloth
- 58: coconut
- 59: copper
- 60: copper alloy
- 61: coral
- 62: cotton
- 63: crystal
- 64: diorite
- 65: dolerite
- 66: earthenware
- 67: ebony
- 68: emerald
- 69: enamel
- 70: faience
- 71: felt
- 72: flax
- 73: flint
- 74: gauze
- 75: glass
- 76: gold
- 77: granite
- 78: gray ware
- 79: hardwood
- 80: horn
- 81: incense
- 82: ink
- 83: iron
- 84: ivory
- 85: jade
- 86: jadeite
- 87: jasper
- 88: lacquer
- 89: lapis lazuli
- 90: lazurite
- 91: lead
- 92: lead alloy
- 93: leather
- 94: limestone
- 95: linen
- 96: malachite
- 97: marble
- 98: metal
- 99: mineral
- 100: mother of pearl
- 101: muslin
- 102: nephrite
- 103: nylon
- 104: obsidian
- 105: organic material
- 106: paint
- 107: palm fiber
- 108: palm leaf
- 109: paper
- 110: papier mâché
- 111: papyrus
- 112: pewter
- 113: photographic paper
- 114: pine
- 115: plant fiber
- 116: plaster
- 117: plastic
- 118: plate
- 119: polyester
- 120: polystyrene
- 121: porcelain
- 122: pottery
- 123: quartzite
- 124: rattan
- 125: realgar
- 126: reed
- 127: rice paper
- 128: rock
- 129: rush
- 130: sandstone
- 131: satin
- 132: schist
- 133: seashell
- 134: serpentine
- 135: shell
- 136: silk
- 137: siltstone
- 138: silver
- 139: skull
- 140: slate
- 141: soapstone
- 142: softwood
- 143: stalagmites
- 144: steel
- 145: stone
- 146: stoneware
- 147: straw
- 148: stucco
- 149: sycamore
- 150: synthetic fiber
- 151: teak
- 152: terracotta
- 153: textiles
- 154: tin
- 155: tortoise shell
- 156: tourmaline
- 157: travertine
- 158: tremolite
- 159: turquoise
- 160: velvet
- 161: wood
- 162: wool
- 163: wrought iron
- 164: zinc alloy
- production.period: 类型为字符串
- production.place: 类型为字符串
- new_root: 类型为字符串
数据分割
数据集分为以下几个部分:
- train: 包含23060个样本,大小为741469047.96字节
- validation: 包含5426个样本,大小为168672680.74字节
- test: 包含5426个样本,大小为137567256.474字节
数据集大小
- 下载大小: 950282711字节
- 数据集总大小: 1047708985.174字节
配置信息
- config_name: default
- 数据文件:
- train: data/train-*
- validation: data/validation-*
- test: data/test-*
- 数据文件:
搜集汇总
数据集介绍

构建方式
在文化遗产数字化保护的背景下,OrientalMuseum_min5-white-mat数据集通过系统化采集与标注构建而成。该数据集整合了东方博物馆藏品的多模态信息,每一条记录均包含藏品图像、编号、名称、描述及材质分类标签。数据构建过程遵循严格的学术规范,从原始馆藏资料中提取关键元数据,并依据专业分类体系对材质属性进行精细标注,确保了数据来源的可靠性与标注的准确性。数据集按标准比例划分为训练集、验证集和测试集,为后续计算研究提供了结构化的基础。
特点
该数据集的核心特点在于其涵盖的材质类别极为丰富,标签体系包含从动物木乃伊、蜡染到玉石、陶瓷、金属等逾160种具体材质,充分反映了东方文物材质的多样性。每条数据均关联图像与文本描述,形成了多模态对齐的数据结构,便于开展跨模态学习研究。数据规模适中,涵盖两万余条样本,在保证覆盖面的同时兼顾了处理效率,适用于材质识别、文物分类及文化遗产分析等任务。
使用方法
研究人员可通过HuggingFace平台直接加载该数据集,利用其预划分的训练、验证与测试分割开展监督学习任务。数据集适用于图像分类、材质识别及多模态联合建模等研究方向。在使用时,可依据‘label’字段进行材质类别预测,或结合‘description’、‘object_name’等文本字段进行跨模态分析。数据以标准图像与结构化字段存储,兼容主流深度学习框架,便于进行端到端的模型训练与评估。
背景与挑战
背景概述
在文化遗产数字化与人工智能交叉研究领域,东方博物馆文物材质识别数据集应运而生,旨在应对博物馆藏品管理中材质分类的复杂性。该数据集由研究者James Burton等人构建,聚焦于东方博物馆藏品的多模态数据整合,核心研究问题在于通过机器学习技术对文物材质进行精确识别与分类,从而支持文物鉴定、保存与数字化展示。其涵盖从动物木乃伊到各类矿石、织物等超过160种材质类别,为文化遗产保护提供了重要的数据基础,推动了计算机视觉与博物馆学的深度融合。
当前挑战
该数据集致力于解决文物材质自动识别这一复杂领域问题,其挑战在于文物材质类别的极度多样性及类间相似性高,如不同矿石或织物的视觉特征差异微妙,导致模型区分难度大。在构建过程中,挑战主要源于数据标注的专业性要求高,需依赖文物专家知识进行精确材质判定,且文物图像常存在光照不均、背景干扰或局部遮挡等问题,增加了数据清洗与标准化的复杂度。
常用场景
经典使用场景
在文化遗产数字化与博物馆学领域,OrientalMuseum_min5-white-mat数据集为文物材质识别提供了关键资源。该数据集通过精细标注的文物图像与材质标签,支持计算机视觉模型在复杂背景下对东方博物馆藏品的材质分类任务,尤其适用于多类别细粒度识别场景,如区分玉器、陶瓷、纺织品等传统手工艺材料。
实际应用
在博物馆数字化管理实践中,该数据集能够支撑智能策展系统与文物档案自动化构建。通过材质识别模型,可辅助策展人快速筛选特定材质的展品,优化展览主题设计;同时为文物修复工作提供材质溯源参考,提升文物保护工作的科学性与效率。
衍生相关工作
基于该数据集衍生的经典研究聚焦于跨模态文物分析框架的构建。例如,结合材质标签与文物年代、产地信息的联合学习模型,促进了多属性文物检索系统的发展;同时,生成对抗网络在该数据集上的应用,推动了文物材质纹理合成与破损区域修复技术的进步。
以上内容由遇见数据集搜集并总结生成



