AI and paleontology: Effects of vertebrate fossil sample size on machine learning image classification
收藏DataCite Commons2026-03-05 更新2026-04-25 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.zpc866tpq
下载链接
链接失效反馈官方服务:
资源简介:
With the growing application of artificial intelligence (AI) and machine
learning (ML), great potential exists to leverage these technologies in
paleontology. Relative to many other scientific fields, a challenge of ML
applied to paleontology is small sample sizes, particularly for fossil
vertebrates. Shark teeth, abundant in the fossil record, provide a model
system to use ML across varying sample sizes. Here we use six classes
(taxa) of Neogene shark teeth for taxonomic identification, including a
curated dataset of 3150 images. Each class was evaluated using an 80%
training and 20% validation split, with a separate, external test set of
25 samples per class. Pretrained models perform well (accuracy >
90%), providing a strong baseline for classification. However, enabling
fine-tuning of the ML model to identify fossil shark teeth improves
performance considerably. Likewise, sample size per class also affects the
accuracy of the models’ classifications. Smaller sample sizes (n = 50
individuals per class) yielded a mean accuracy of 93.4%, but plateaued at
~99% between 200 and 500 images per class. Confidence likewise increases
with larger samples, from 81.8% (n = 50 individuals per class) to
>90% (n = 300 to 500 individuals per class). Misidentifications
followed consistent patterns, reflecting morphological similarities and/or
poor preservation. Artificially increasing the training datasets using
data augmentation improves the confidence of identifications. This
research indicates that relatively small samples of vertebrate species
(~50 to 500 individuals per class) can effectively train an ML model to
identify these shark teeth with high levels of accuracy.
提供机构:
Dryad
创建时间:
2026-01-30



