five

Data for manuscript "Efficient generation of training libraries for image classification models from images of herbarium specimens"

收藏
Research Data Australia2024-12-14 收录
下载链接:
https://researchdata.edu.au/data-manuscript-efficient-herbarium-specimens/3377349
下载链接
链接失效反馈
官方服务:
资源简介:
Images of 33 herbarium sheets used to demonstrate the efficient creation of training libraries for image classification, labelImg annotation files, and the cropped images resulting from the process. Image classification model trained by Microsoft Custom Vision.\nLineage: The starting point of the project were high-resolution images of 33 herbarium sheets of Waitzia taken during the digitisation of all Asteraceae specimens at the Australian National Herbarium (CANB). We used labelImg (github.com/heartexlabs/labelImg) to mark flowerheads across all specimens with bounding boxes. To create a negative category, we also drew 96 bounding boxes of parts of selected specimen photos not showing flowerheads, such as leaves and branches, stamps, labels, colour charts, barcodes, and empty cardboard.\n\nAnnotations by labelImg are saved in XML format. Using a simple Python 3 script, we read all annotation files, cropped out the parts of all images indicated by the annotations, and exported them to an output directory.\n\nWe created a general compact multiclass image classification project in Microsoft Custom Vision and uploaded all cropped images as training and testing data, tagging them as the appropriate taxon or as Negative. A single training session was conducted with automatic termination.

本数据集包含33张标本馆标本页图像(用于演示图像分类训练库的高效构建)、labelImg注释文件及该过程生成的裁剪图像。图像分类模型由Microsoft Custom Vision训练。 谱系:项目起点为澳大利亚国家标本馆(CANB)菊科(Asteraceae)所有标本数字化过程中拍摄的33张Waitzia属标本馆标本页高分辨率图像。我们使用labelImg(github.com/heartexlabs/labelImg)在所有标本上以边界框标记花头区域。为构建负样本类别,我们还在选定标本照片的非花头区域(如叶片、枝条、印章、标签、色卡、条形码及空白纸板)绘制了96个边界框。 labelImg生成的注释以XML格式保存。我们通过一个简单的Python 3脚本读取所有注释文件,裁剪出注释所指示的图像区域,并将其导出至输出目录。 我们在Microsoft Custom Vision中创建了一个通用紧凑型多类图像分类项目,上传所有裁剪图像作为训练与测试数据,并将其标记为相应分类单元(taxon)或负样本。随后进行了一次自动终止的训练会话。
提供机构:
Commonwealth Scientific and Industrial Research Organisation
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作