Estimation of species abundance based on the number of segregating sites using environmental DNA (eDNA)

DataONE2024-04-24 更新2024-06-08 收录

下载链接：

https://search.dataone.org/view/sha256:953cc497d3fa4ccc0d0064735312be7386de18f5b57c86e994bcc19f8e4816a8

下载链接

链接失效反馈

官方服务：

资源简介：

The advancement of environmental DNA (eDNA) has enabled rapid and non-invasive species detection in aquatic environments. While most studies focus on detecting species presence or absence, recent research has explored using eDNA data to quantify species abundance. This estimation usually is based on the concentration of targeted eDNA. However, eDNA concentration can be influenced by various factors, both biotic and abiotic, which can obscure the relationship between concentration and species abundance. In this study, we suggest using the number of segregating sites as a proxy for estimating species abundance. We investigated this relationship in silico, in vitro, and in situ (mesocosm experiments) using two brackish goby species, Acanthogobius hasta and Tridentiger bifasciatus. Analysis of simulated and in vitro data, where DNA was mixed from a known number of individuals, revealed a strong correlation between the number of segregating sites and species abundance (R2 > 0.9; P < 0...., We first assessed the relationship between the number of segregating sites and species abundance by entirely simulated sequences. The length of simulated sequences was set at 17,000 bp, close to the total size of 11 target segments. The number of simulated sequences/individuals was 1000, and sequences were generated at the mutation rate of 10-6 /bp/gen. To account for mutation rate variation among different species, we also generated another two datasets at the mutation rate of 10-7 /bp/gen and 10-8 /bp/gen. All data were generated using the software Fastsimcoal2 (Excoffier & Foll, 2011). A subset of sequences were randomly chosen from the simulated data, ranging from 20 to 980 sequences with intervals of 20. Selected sequences were aligned using MUSCLE v1.0 (Edgar, 2004), then the number of segregating sites was counted from alignments. The simulation process was repeated three times at each specified number of sequences. The correlation between the number of segregating sites and ..., , # Estimation of Species Abundance Based on the Number of Segregating Sites using Environmental DNA (eDNA) [https://doi.org/10.5061/dryad.w3r2280zz](https://doi.org/10.5061/dryad.w3r2280zz) This is a set of simulated sequences to explore the relationship between number of segregation site and species abundance. ## Description of the data and file structure We generated the sequence at three mutation rate: 10-6/bp/gen, 10-7/bp/gen and 10-8/bp/gen, correponding to `Simulated_seq__mutation_rate-10-6.fa`, `Simulated_seq__mutation_rate-10-7.fa`, `Simulated_seq__mutation_rate-10-8.fa`. We generated 1,000 sequences with the sequence length of 17,000 bp for each mutation rate. No insertion or deletion was simulated in the simulated sequences. Only loci with mutations are included. ## Code/Software All data were generated using the software Fastsimcoal2 (Excoffier & Foll, 2011).

# 基于环境DNA（environmental DNA, eDNA）分离位点数量估算物种丰度环境DNA（environmental DNA, eDNA）技术的发展实现了水生环境中快速且非侵入式的物种检测。当前多数研究聚焦于物种存在/缺失的判定，而近期研究开始探索利用eDNA数据量化物种丰度。此类估算通常基于靶向eDNA的浓度，但eDNA浓度易受生物与非生物等多种因素影响，可能掩盖浓度与物种丰度间的真实关联。本研究提出以分离位点（segregating sites）数量作为代理指标，用于物种丰度的估算。我们以两种咸水鰕虎鱼物种（Acanthogobius hasta与Tridentiger bifasciatus）为研究对象，通过硅内（in silico）、体外（in vitro）及原位（in situ，中宇宙实验）实验探究了二者间的关联关系。对已知个体数混合的DNA所对应的模拟数据与体外数据开展分析后发现，分离位点数量与物种丰度间存在极强的相关性（R² > 0.9；P < 0.……）。我们首先通过完全模拟的序列评估分离位点数量与物种丰度的关联：模拟序列长度设置为17000 bp，接近11个靶标片段的总长度；模拟序列/个体数为1000，以10⁻⁶ /bp/gen的突变率生成序列。为考量不同物种间的突变率差异，我们还分别以10⁻⁷ /bp/gen和10⁻⁸ /bp/gen的突变率生成了另外两组数据集。所有数据均通过Fastsimcoal2软件（Excoffier & Foll, 2011）生成。从模拟数据中随机选取20至980、间隔为20的序列子集，使用MUSCLE v1.0（Edgar, 2004）对选取的序列进行多序列比对，随后从比对结果中统计分离位点数量。每个指定序列数下的模拟过程均重复三次。分离位点数量与……[原文未完成]。 DOI: https://doi.org/10.5061/dryad.w3r2280zz 本数据集为一组模拟序列，用于探究分离位点数量与物种丰度间的关联关系。 # 数据集与文件结构说明我们以三种突变率生成序列：10⁻⁶/bp/gen、10⁻⁷/bp/gen和10⁻⁸/bp/gen，对应文件分别为`Simulated_seq__mutation_rate-10-6.fa`、`Simulated_seq__mutation_rate-10-7.fa`和`Simulated_seq__mutation_rate-10-8.fa`。每种突变率下均生成1000条长度为17000 bp的序列，模拟序列中未设置插入或缺失变异，仅保留携带突变的位点。 ## 代码与软件说明所有数据均通过Fastsimcoal2软件（Excoffier & Foll, 2011）生成。

创建时间：

2025-07-30

搜集汇总

数据集介绍

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集