five

PlutonicRocks-13: An Imbalanced Image Dataset of Plutonic Rocks

收藏
科学数据银行2025-11-12 更新2026-04-23 收录
下载链接:
https://www.scidb.cn/detail?dataSetId=090f1c9e55494dd084c45ed74a7d978f
下载链接
链接失效反馈
官方服务:
资源简介:
Rock image recognition is one of the fundamental skills for geologists. Due to the rise of artificial intelligence (AI), a fundamental challenge and opportunity in geosciences lies in translating expert geological knowledge into AI models capable of delivering intelligent lithological recognition services, enabling geoscience enthusiasts or non-geologists to accurately identify rock types. In natural environments, the spatial distribution of surface rocks is highly heterogeneous, resulting in rock image datasets that typically follow a long-tailed distribution. Taking plutonic rocks as an example, this study presents PlutonicRocks-13, an imbalanced dataset for rock image recognition. The dataset includes 13 common types of plutonic rocks, with a total of 51,800 images and a total size of 824 MB. The rock types included in this dataset are largely consistent with the common categories defined in the International Union of Geological Sciences (IUGS) classification scheme for plutonic rocks, including olivine, pyroxenite, hornblendite, gabbro, diorite, monzonite, syenite, nepheline syenite, granodiorite, monzogranite, syenogranite, and plagiogranite. Rock images were primarily collected from two sources: field outcrops and hand specimens curated by geological institutions and museums. After careful screening, processing, and annotation, these images were curated into PlutonicRocks-13, a dataset tailored for rock image classification. Furthermore, by converting annotated labels into question-answer pairs, this dataset can be used for instruction tuning of multimodal models, enabling them to perform rock image classification through natural language instructions. This image dataset provides reliable data support for research on automated rock image recognition and holds significant reference value for geological surveys, surficial substrate investigations, and public geoscience education.

岩石图像识别是地质学家的核心技能之一。随着人工智能(Artificial Intelligence, AI)技术的兴起,地球科学领域的一项核心挑战与机遇在于将专家地质知识转化为可提供智能岩性识别服务的人工智能模型,使地质爱好者或非地质专业人员也能精准识别岩石类型。在自然环境中,地表岩石的空间分布具有高度异质性,这使得岩石图像数据集通常呈现长尾分布。本研究以深成岩为例,构建了PlutonicRocks-13数据集——一款适用于岩石图像识别的非平衡数据集。该数据集涵盖13种常见深成岩类型,总计51800张图像,总容量达824 MB。本数据集包含的岩石类型与国际地质科学联合会(International Union of Geological Sciences, IUGS)制定的深成岩分类方案中的常见类别基本一致,具体包括橄榄岩、辉石岩、角闪石岩、辉长岩、闪长岩、二长岩、正长岩、霞石正长岩、花岗闪长岩、二长花岗岩、正长花岗岩以及斜长花岗岩。岩石图像主要来源于两个渠道:野外露头以及地质机构与博物馆馆藏的手标本。经过严格筛选、处理与标注后,这些图像被整理为专为岩石图像分类任务打造的PlutonicRocks-13数据集。此外,通过将标注标签转换为问答对形式,该数据集还可用于多模态模型的指令微调,使模型能够通过自然语言指令完成岩石图像分类任务。本图像数据集为自动化岩石图像识别研究提供了可靠的数据支撑,同时对地质调查、地表基质调查以及公众地球科学教育均具有重要的参考价值。
提供机构:
安徽省地质调查院(安徽省地质科学研究所); chen zhong liang; 合肥工业大学
创建时间:
2025-11-12
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作