five

Data Frames for Spatial-Temporal Patterns of Perfluoroalkyl Substances in the Biota of the Laurentian Great Lakes: A Meta-Analysis

收藏
Zenodo2026-02-21 更新2026-05-26 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.16173644
下载链接
链接失效反馈
官方服务:
资源简介:
Directory This site contains the two finalized, imputed data frames generated from the workflow of Martin et al. (2026): Spatial-Temporal Patterns of Perfluoroalkyl Substances in the Biota of the Laurentian Great Lakes: A Meta-Analysis. These two datasets were used to construct generalized additive models (GAMs) for the concentrations of six per- and polyfluoroalkyl substances (PFAS) commonly detected in tissue samples of biota from the watersheds of the Laurentian Great Lakes. The compiled dataset includes 76 species/genera from eight taxonomic groups and seven trophic levels. Fish (Pisces: 50 species and 1,681 samples) and birds (Aves: 8 species and 633 samples) were the most frequently surveyed taxonomic groups. The most prominent trophic levels in the data were Quaternary Consumers (i.e., large-bodied piscivorous fish like salmon and pike; 15 species and 1,163 samples) and Piscivorous/Insectivorous Birds (e.g., gulls and cormorants; 6 species and 444 samples). Descriptions of the two datasets included in the .zip file are as follows: Finalized_Imputed_Data_Frame.csv: this file stores the finalized data frame used in modeling for six PFAS (PFOS, PFNA, PFDA, PFUnA, PFDoA, and PFTrDA). A total of 2,489 samples from 50 studies are included in the file (Table 1), with left-censored concentration values imputed using log-ratio Expectation-Maximization and Data Augmentation algorithms from the zCompositions package (Palarea-Albaladejo & Martín-Fernández, 2015). For a thorough description of the different variables included in the data frame, please see the accompanying table (Table 2). Finalized_Supp_Validation_Data_Frame.csv: this file contains the set of 8 formatted data points that are used with the test set (Finalized_Imputed_Data_Frame.csv was split into 80% training and 20% test sets during modeling) to assess the predictive accuracy of the six GAMs. The set of included variables, as well as the general formatting, mirror the Finalized_Imputed_Data_Frame.csv file. Citation Information If you wish to use the finalized dataset for research purposes, please cite our publication (Martin et al., 2026), version v1.0.0 of the corresponding GitHub repository (Martin et al., 2026; https://doi.org/10.5281/zenodo.18683257), and this Zenodo data repository (Martin et al., 2025; https://doi.org/10.5281/zenodo.16173643). **Disclaimer** PLEASE NOTE: The release contains data that have been heavily formatted and processed for this meta-analysis. The data frame has not been reviewed or approved by agencies or entities that may have been involved in the individual studies included in the meta-analysis. In several cases, some of the meta-data associated with this data frame's samples were calculated as averages of ranges provided in the respective publications (e.g., an estimated average collection date of May 15th for samples that were collected from April-June). As a result, exact meta-data values may, in some cases, be inaccurate for specific samples, which is merely reflective of the information that was available in the publications for those samples. If copies of the original raw data files are desired, we recommend contacting the corresponding authors of the appropriate papers and/or accessing the data portals where those files are available. The data sources we used for this project are provided in Table 1. To access the code used to generate these two data frames and model PFAS concentrations, please refer to the GitHub repository link at the bottom of the page.
提供机构:
Zenodo
创建时间:
2025-08-12
二维码
社区交流群
二维码
科研交流群
商业服务