AI and paleontology: Effects of vertebrate fossil sample size on machine learning image classification

Name: AI and paleontology: Effects of vertebrate fossil sample size on machine learning image classification
Creator: Dryad
Published: 2026-03-05 23:16:59
License: 暂无描述

DataCite Commons2026-03-05 更新2026-04-25 收录

下载链接：

https://datadryad.org/dataset/doi:10.5061/dryad.zpc866tpq

下载链接

链接失效反馈

官方服务：

资源简介：

With the growing application of artificial intelligence (AI) and machine learning (ML), great potential exists to leverage these technologies in paleontology. Relative to many other scientific fields, a challenge of ML applied to paleontology is small sample sizes, particularly for fossil vertebrates. Shark teeth, abundant in the fossil record, provide a model system to use ML across varying sample sizes. Here we use six classes (taxa) of Neogene shark teeth for taxonomic identification, including a curated dataset of 3150 images. Each class was evaluated using an 80% training and 20% validation split, with a separate, external test set of 25 samples per class. Pretrained models perform well (accuracy > 90%), providing a strong baseline for classification. However, enabling fine-tuning of the ML model to identify fossil shark teeth improves performance considerably. Likewise, sample size per class also affects the accuracy of the models’ classifications. Smaller sample sizes (n = 50 individuals per class) yielded a mean accuracy of 93.4%, but plateaued at ~99% between 200 and 500 images per class. Confidence likewise increases with larger samples, from 81.8% (n = 50 individuals per class) to >90% (n = 300 to 500 individuals per class). Misidentifications followed consistent patterns, reflecting morphological similarities and/or poor preservation. Artificially increasing the training datasets using data augmentation improves the confidence of identifications. This research indicates that relatively small samples of vertebrate species (~50 to 500 individuals per class) can effectively train an ML model to identify these shark teeth with high levels of accuracy.

提供机构：

Dryad

创建时间：

2026-01-30

5,000+

优质数据集

54 个

任务类型

进入经典数据集