five

Machine Learning Correlation of Electron Micrographs and ToF-SIMS for the Analysis of Organic Biomarkers in Mudstone

收藏
Figshare2026-04-28 收录
下载链接:
https://figshare.com/articles/dataset/Machine_Learning_Correlation_of_Electron_Micrographs_and_ToF-SIMS_for_the_Analysis_of_Organic_Biomarkers_in_Mudstone/28061686
下载链接
链接失效反馈
官方服务:
资源简介:
The spatial distribution of organics in geological samples can be used to determine when and how these organics were incorporated into the host rock. Mass spectrometry (MS) imaging can rapidly collect a large amount of data, but ions produced are mixed without discrimination, resulting in complex mass spectra that can be difficult to interpret. Here, we apply unsupervised and supervised machine learning (ML) to help interpret spectra from time-of-flight-secondary ion mass spectrometry (ToF-SIMS) of an organic-carbon-rich mudstone of the Middle Jurassic of England (UK). It was previously shown that the presence of sterane molecular biomarkers in this sample can be detected via ToF-SIMS (Pasterski, M. J. et al., Astrobiology 2023, 23, 936). We use unsupervised ML on scanning electron microscopy–electron dispersive spectroscopy (SEM-EDS) measurements to define compositional categories based on differences in elemental abundances. We then test the ability of four ML algorithmsk-nearest neighbors (KNN), recursive partitioning and regressive trees (RPART), eXtreme gradient boost (XGBoost), and random forest (RF)to classify the ToF-SIM spectra using (1) the categories assigned via SEM-EDS, (2) organic and inorganic labels assigned via SEM-EDS, and (3) the presence or absence of detectable steranes in ToF-SIMS spectra. In terms of predictive accuracy and balanced accuracy, KNN was the best performing model and RPART the worst. The feature importance, or the specific features of the ToF-SIM spectra used by the models to make classifications, cannot be determined for KNN, preventing posthoc model interpretation. Nevertheless, the feature importance extracted from the other models was useful for interpreting spectra. We determined that some of the organic ions used to classify biomarker containing spectra may be fragment ions derived from kerogen which is abundant in this mudstone sample.
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作