james-burton/OrientalMuseum_min5-name-text
收藏Hugging Face2024-02-28 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/james-burton/OrientalMuseum_min5-name-text
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含多个特征,如对象编号、文件、图像、根目录、描述和标签等。标签特征涵盖了从艺术品到日常用品的广泛类别,包括专辑绘画、动物雕像、动物木乃伊、动物骨骼、斧头、腰带钩、衬衫、螺栓、盒子、笔筒、帽子、盒子、烟斗、化妆品和医疗设备、杯子和碟子、DVD、匕首、圆盘、家用设备和器具、耳环、戒指、葬礼锥、葬礼用品、葬礼钱币、挂饰、心形圣甲虫、人形雕像、香炉、墨条、风筝、眼影罐、信件、手稿页、垫子、云母画、微型画、研钵、木乃伊标签、甲骨、陶片、调色板、面板、零件、笔盒、吊坠、烟斗、纸莎草画、牌匾、盘子、祈祷轮、圣甲虫印章、围巾、屏风、印章、幻灯片、支架、桌子、唐卡、墓葬模型、水滴器、水壶、木版画、配饰、相册、祭坛组件、护身符、动画赛璐珞、动画图纸、盔甲、箭头、斧头、木工工具斧头、徽章、袋子、绷带、篮子、珠子、铃铛、腰带、刀片、棋盘游戏、书籍、瓶子、碗、盒子、手镯、砖、胸针、笔洗、桶、扣子、书法、烛台、卡诺匹克罐、卡片、雕刻品、链条、棋子、筷子、烟斗、布料、衣物、外套、硬币、领子、光盘、容器、覆盖物、盖子、杯子、神像、图表、盘子、玩偶、图纸、连衣裙、鼓、耳环、刺绣、套装、信封、个人用品设备、水壶、扇子、雕像、小雕像、顶饰、旗帜、烧瓶、碎片、家具组件、游戏板、游戏计数器、玻璃器皿、锣、发饰、发簪、手柄、马具、帽子、头饰、头部、香炉、镶嵌物、夹克、罐子、珠宝、小壶、壶、钥匙、和服、刀、灯、灯笼、盖子、权杖、面具、奖章、微型模型、镜子、模型、支架、钉子、项链、针、根付、装饰品、页面、绘画、纸币、纸莎草、吊坠、衬裙、照片、图片、别针、扑克牌、扑克、邮票、明信片、海报、罐子、陶器、印刷块、印刷品、木偶、钱包、浮雕、戒指、长袍、拓片、地毯、凉鞋、纱丽、纱笼、腰带、碟子、剑鞘、圣甲虫、权杖、卷轴、种子、隔板、皮影戏、披肩、贝壳、碎片、盾牌、鞋子、水桶、草图、裙子、鼻烟壶、袜子、刮刀、勺子、雕像、小雕像、石碑、带子、螺柱、剑、平板、图钉、茶碗、茶壶、瓷砖、工具、玩具、托盘、管子、镊子、内衣、未识别物品、乌沙布提、器具、花瓶、容器、重量、砝码、纺锤轮、木块等。数据集分为训练集、验证集和测试集,分别包含7168、1687和1687个样本。
This dataset includes multiple features, such as object ID, file, image, root directory, description, label, etc. The label feature covers a wide range of categories from artworks to daily necessities, including album paintings, animal statues, animal mummies, animal bones, axes, belt hooks, shirts, bolts, boxes, pen holders, hats, boxes, pipes, cosmetics and medical equipment, cups and saucers, DVDs, daggers, discs, household equipment and appliances, earrings, rings, funeral cones, funeral supplies, funeral coins, pendants, heart-shaped scarabs, humanoid statues, incense burners, ink sticks, kites, eyeshadow jars, letters, manuscript pages, mats, mica paintings, miniatures, mortars, mummy labels, oracle bones, pottery shards, palettes, panels, parts, pencil cases, pendants, pipes, papyrus paintings, plaques, plates, prayer wheels, scarab seals, scarves, screens, seals, slides, stands, tables, Thangkas, funeral models, water droppers, kettles, woodblock prints, accessories, photo albums, altar assemblies, amulets, animation cels, animation drawings, armor, arrows, axes, woodworking axes, badges, bags, bandages, baskets, beads, bells, belts, blades, board games, books, bottles, bowls, boxes, bracelets, bricks, brooches, brush washers, buckets, buckles, calligraphy, candlesticks, Canopic jars, cards, carvings, chains, chess pieces, chopsticks, pipes, fabrics, clothing, coats, coins, collars, optical discs, containers, coverings, lids, cups, divine statues, charts, plates, dolls, drawings, dresses, drums, earrings, embroidery, suits, envelopes, personal supplies and equipment, kettles, fans, statues, figurines, finials, flags, flasks, fragments, furniture components, game boards, game counters, glassware, gongs, hair ornaments, hairpins, handles, harnesses, hats, headdresses, heads, incense burners, inlays, jackets, jars, jewelry, small pots, kettles, keys, kimonos, knives, lamps, lanterns, lids, scepters, masks, medals, mini-models, mirrors, models, stands, nails, necklaces, needles, netsuke, ornaments, pages, paintings, banknotes, papyrus, pendants, petticoats, photos, pictures, pins, playing cards, poker, stamps, postcards, posters, jars, pottery, printing blocks, prints, puppets, wallets, reliefs, rings, robes, rubbings, carpets, sandals, saris, sarongs, belts, saucers, scabbards, scarabs, scepters, scrolls, seeds, partitions, shadow plays, shawls, shells, fragments, shields, shoes, buckets, sketches, skirts, snuff bottles, socks, scrapers, spoons, statues, figurines, steles, straps, studs, swords, flat plates, drawing pins, tea bowls, teapots, tiles, tools, toys, trays, tubes, tweezers, underwear, unidentified items, ushabtis, utensils, vases, containers, weights, scale weights, spindle whorls, wood blocks, etc. The dataset is divided into training, validation and test sets, with 7168, 1687 and 1687 samples respectively.
提供机构:
james-burton
原始信息汇总
数据集信息
特征
- obj_num: 字符串类型
- file: 字符串类型
- image: 图像类型
- root: 字符串类型
- description: 字符串类型
- label: 分类标签类型
- class_label:
- names:
- 0: Album Painting
- 1: Animal Figurine
- 2: Animal Mummy
- 3: Animal bone
- 4: Axe Head
- 5: Belt Hook
- 6: Blouse
- 7: Bolt
- 8: Box
- 9: Brush Pot
- 10: Cap
- 11: Case
- 12: Clay pipe (smoking)
- 13: Cosmetic and Medical Equipment and Implements
- 14: Cup And Saucer
- 15: DVDs
- 16: Dagger
- 17: Disc
- 18: Domestic Equipment and Utensils
- 19: Earring
- 20: Finger Ring
- 21: Funerary Cone
- 22: Funerary goods
- 23: Funerary money
- 24: Hanging
- 25: Heart Scarab
- 26: Human Figurine
- 27: Incense Holder
- 28: Inkstick
- 29: Kite
- 30: Kohl Pot
- 31: Letter
- 32: Manuscript Page
- 33: Mat
- 34: Mica Painting
- 35: Miniature Painting
- 36: Mortar
- 37: Mummy Label
- 38: Oracle Bone
- 39: Ostraka
- 40: Palette
- 41: Panel
- 42: Part
- 43: Pencase
- 44: Pendant
- 45: Pipe
- 46: Pith Painting
- 47: Plaque
- 48: Plate
- 49: Prayer Wheel
- 50: Scarab Seal
- 51: Scarf
- 52: Screen
- 53: Seal
- 54: Slide
- 55: Stand
- 56: Table
- 57: Thangka
- 58: Tomb Model
- 59: Water Dropper
- 60: Water Pot
- 61: Woodblock Print
- 62: accessories
- 63: albums
- 64: altar components
- 65: amulets
- 66: animation cels
- 67: animation drawings
- 68: armor
- 69: arrowheads
- 70: axes
- 71: axes: woodworking tools
- 72: badges
- 73: bags
- 74: bandages
- 75: baskets
- 76: beads
- 77: bells
- 78: belts
- 79: blades
- 80: board games
- 81: books
- 82: bottles
- 83: bowls
- 84: boxes
- 85: bracelets
- 86: brick
- 87: brooches
- 88: brush washers
- 89: buckets
- 90: buckles
- 91: calligraphy
- 92: candleholders
- 93: canopic jars
- 94: cards
- 95: carvings
- 96: chains
- 97: chessmen
- 98: chopsticks
- 99: claypipe
- 100: cloth
- 101: clothing
- 102: coats
- 103: coins
- 104: collar
- 105: compact discs
- 106: containers
- 107: coverings
- 108: covers
- 109: cups
- 110: deity figurine
- 111: diagrams
- 112: dishes
- 113: dolls
- 114: drawings
- 115: dresses
- 116: drums
- 117: earrings
- 118: embroidery
- 119: ensembles
- 120: envelopes
- 121: equipment for personal use: grooming, hygiene and health care
- 122: ewers
- 123: fans
- 124: figures
- 125: figurines
- 126: finials
- 127: flags
- 128: flasks
- 129: fragments
- 130: furniture components
- 131: gameboards
- 132: gaming counters
- 133: glassware
- 134: gongs
- 135: hair ornaments
- 136: hairpins
- 137: handles
- 138: harnesses
- 139: hats
- 140: headdresses
- 141: heads
- 142: incense burners
- 143: inlays
- 144: jackets
- 145: jars
- 146: jewelry
- 147: juglets
- 148: jugs
- 149: keys
- 150: kimonos
- 151: knives
- 152: lamps
- 153: lanterns
- 154: lids
- 155: maces
- 156: masks
- 157: medals
- 158: miniatures
- 159: mirrors
- 160: models
- 161: mounts
- 162: nails
- 163: necklaces
- 164: needles
- 165: netsukes
- 166: ornaments
- 167: pages
- 168: paintings
- 169: paper money
- 170: papyrus
- 171: pendants
- 172: petticoats
- 173: photographs
- 174: pictures
- 175: pins
- 176: playing cards
- 177: poker
- 178: postage stamps
- 179: postcards
- 180: posters
- 181: pots
- 182: pottery
- 183: printing blocks
- 184: prints
- 185: puppets
- 186: purses
- 187: reliefs
- 188: rings
- 189: robes
- 190: rubbings
- 191: rugs
- 192: sandals
- 193: saris
- 194: sarongs
- 195: sashes
- 196: saucers
- 197: scabbards
- 198: scaraboids
- 199: scarabs
- 200: scepters
- 201: scrolls
- 202: seed
- 203: seppa
- 204: shadow puppets
- 205: shawls
- 206: shell
- 207: sherds
- 208: shields
- 209: shoes
- 210: situlae
- 211: sketches
- 212: skirts
- 213: snuff bottles
- 214: socks
- 215: spatulas
- 216: spoons
- 217: statues
- 218: statuettes
- 219: stelae
- 220: straps
- 221: studs
- 222: swords
- 223: tablets
- 224: tacks
- 225: tea bowls
- 226: teapots
- 227: tiles
- 228: tools
- 229: toys
- 230: trays
- 231: tubes
- 232: tweezers
- 233: underwear
- 234: unidentified
- 235: ushabti
- 236: utensils
- 237: vases
- 238: vessels
- 239: weight
- 240: weights
- 241: whorls
- 242: wood blocks
- names:
- class_label:
- other_name: 字符串类型
- material: 字符串类型
- production.period: 字符串类型
- production.place: 字符串类型
数据分割
- train:
- 字节数: 941952733.7932324
- 样本数: 7168
- validation:
- 字节数: 183440438.5883838
- 样本数: 1687
- test:
- 字节数: 190577197.6163838
- 样本数: 1687
数据集大小
- 下载大小: 1186854394
- 数据集大小: 1315970369.998
配置
- config_name: default
- data_files:
- split: train
- path: data/train-*
- split: validation
- path: data/validation-*
- split: test
- path: data/test-*
- split: train
- data_files:
搜集汇总
数据集介绍

构建方式
在东方艺术与考古学领域,数据集的构建往往依赖于博物馆馆藏的数字化整理。OrientalMuseum_min5-name-text数据集通过系统化采集英国东方博物馆的实体文物信息,整合了每件藏品的编号、图像、描述文本及多维度元数据。构建过程中,团队对超过万件文物进行了高分辨率图像拍摄,并依据专业分类体系为每件物品标注了精细的类别标签,涵盖从绘画、雕塑到日常用具等242个类别。数据经过清洗与校验,确保描述文本与图像对应准确,最终按标准比例划分为训练集、验证集与测试集,为后续计算分析提供了结构化的多模态基础。
使用方法
研究人员可利用该数据集开展多种计算任务,尤其在计算机视觉与自然语言处理的交叉领域。在视觉分类任务中,可直接使用图像与标签数据训练模型,实现文物类型的自动识别;跨模态学习则可通过对齐图像与其文本描述,构建图文检索系统。此外,结合材质、年代等元数据,能进行风格分析、产地溯源或历史时期聚类研究。数据集已预先分割为训练、验证与测试集,支持端到端的模型开发与评估。使用时可借助HuggingFace平台加载,灵活提取所需字段,融入深度学习流程。
背景与挑战
背景概述
在文化遗产数字化与人工智能交叉领域,东方博物馆文物数据集(OrientalMuseum_min5-name-text)的构建标志着对亚洲艺术与考古藏品系统性计算分析的重要进展。该数据集由研究者James Burton等人整理,依托博物馆馆藏资源,旨在通过多模态数据(如图像、文本描述、材质与年代信息)支持文物自动分类与识别研究。其核心研究问题聚焦于利用机器学习技术处理高度多样化的文物类别,涵盖从绘画、雕塑到日常用具等两百余种精细分类,为艺术史、考古学及数字人文领域提供了大规模、结构化的基准数据,推动了智能技术在文化遗产保护与知识发现中的应用。
当前挑战
该数据集致力于解决文物自动分类与识别这一复杂领域问题,其挑战在于文物类别的极端多样性(如涵盖“唐卡”“甲骨”“陶俑”等两百余类)以及类间视觉特征的细微差异,要求模型具备高度的语义理解与跨类别区分能力。在构建过程中,挑战主要源于文物数据的多源异构性:图像质量受拍摄条件与文物保存状态影响,文本描述(如“其他名称”“材质”“生产时期”)存在大量缺失或非标准化表述,且需平衡类别样本数量(通过最小样本数阈值筛选)以确保数据代表性,这些因素共同增加了数据清洗、标注对齐与质量控制的复杂度。
常用场景
经典使用场景
在文化遗产数字化与人工智能交叉领域,OrientalMuseum_min5-name-text数据集为多模态视觉识别任务提供了重要支撑。该数据集汇集了东方博物馆藏品的图像与文本描述,涵盖从绘画、雕塑到日常用具等两百余种精细类别,其经典使用场景在于训练深度学习模型进行文物图像的自动分类与标注。通过结合视觉特征与文本元数据,研究者能够构建高效的跨模态检索系统,实现对庞大文物库的智能管理与知识发现,为博物馆学与数字人文研究开辟了新路径。
解决学术问题
该数据集有效应对了文化遗产研究中文物分类体系复杂、标注成本高昂的学术挑战。它通过提供大规模、多类别的标注数据,解决了传统方法在细粒度文物识别上的精度不足问题,支持了跨时期、跨地域的器物风格比较研究。其意义在于建立了标准化的多模态文物数据基准,促进了计算机视觉与考古学、艺术史的学科融合,为文化遗产的数字化保护与智能分析奠定了数据基础。
实际应用
在实际应用层面,该数据集可赋能博物馆的智慧化运营与公众服务。基于其训练的模型能够辅助策展人员快速完成文物编目与展览设计,提升库房管理效率;同时,可集成于在线导览系统,实现游客通过图像检索获取文物详情的交互体验。此外,在教育领域,它能支持开发沉浸式文化学习工具,让用户通过可视化的方式探索东方艺术史,推动文化遗产的普及与传承。
数据集最近研究
最新研究方向
在文化遗产数字化领域,OrientalMuseum_min5-name-text数据集凭借其丰富的东方文物图像与多模态标注,正推动跨模态检索与智能分类的前沿探索。该数据集涵盖从绘画、雕塑到日常器物的精细类别,为深度学习模型提供了识别与理解文物形态、材质及历史背景的宝贵资源。当前研究聚焦于利用视觉-文本对齐技术,实现文物图像的自动化描述生成与风格迁移,助力博物馆数字化管理与文化遗产保护。随着全球对数字人文的重视,此类数据集在促进文物知识图谱构建与沉浸式展览体验方面展现出深远影响,为东方艺术研究注入了新的技术活力。
以上内容由遇见数据集搜集并总结生成



