five

AI and paleontology: Effects of vertebrate fossil sample size on machine learning image classification

收藏
DataONE2026-01-30 更新2026-02-07 收录
下载链接:
https://search.dataone.org/view/sha256:2b5babb4c85bacbe898feefceee71b23baee497c9196b8ef871d597f58a19554
下载链接
链接失效反馈
官方服务:
资源简介:
With the growing application of artificial intelligence (AI) and machine learning (ML), great potential exists to leverage these technologies in paleontology. Relative to many other scientific fields, a challenge of ML applied to paleontology is small sample sizes, particularly for fossil vertebrates. Shark teeth, abundant in the fossil record, provide a model system to use ML across varying sample sizes. Here we use six classes (taxa) of Neogene shark teeth for taxonomic identification, including a curated dataset of 3150 images. Each class was evaluated using an 80% training and 20% validation split, with a separate, external test set of 25 samples per class. Pretrained models perform well (accuracy > 90%), providing a strong baseline for classification. However, enabling fine-tuning of the ML model to identify fossil shark teeth improves performance considerably. Likewise, sample size per class also affects the accuracy of the models’ classifications. Smaller sample sizes (n = 50 ..., , # AI and paleontology: Effects of vertebrate fossil sample size on machine learning image classification Dataset DOI: [10.5061/dryad.zpc866tpq](https://doi.org/10.5061/dryad.zpc866tpq) ## Description of the data and file structure A complete README file, formatted as a .pdf is included in the file 1 README 1.28.26.pdf. The other files are listed as follows: * 3c_Optimal_pixel_density_ML_model_results.csv * 5b_Database_optimal_perf_ML_results_WO_fine_tuning.csv * 5c_Database_optimal_perf_ML_results_W_fine_tuning.csv * 5d_Variance_dataset.csv * 6b_Data_Augmentation-FT_model__RG-HF.csv * 9_SharkAI_R_code_2025_sections_356.R * 1_README_1.29.2026.pdf * 2_Complete_study_image_data_set.pdf * 3ab_Optimal_pixel_density_context_and_statistical_results.pdf * 4_Original_Python-Tensorflow_files_including_model_code.pdf * 5a_Optimal_performance_statistical_results.pdf * 6a_Data_augmentation_statisical_results_.pdf * 7_Misidentification_analysis.pdf * 8_GBIF_plots_of_fossil_Lamnidae_and_Carcharhin...,
创建时间:
2026-01-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作