Metabarcoding of soil environmental DNA replicates plant community variation but not specificity

NIAID Data Ecosystem2026-03-13 收录

下载链接：

http://datadryad.org/dataset/doi%253A10.5061%252Fdryad.xgxd254j2

下载链接

链接失效反馈

官方服务：

资源简介：

While metabarcoding of plant DNA from their environment is an exciting method that can supplement inventorying of live plant species, the accuracy and specificity has yet to be fully assessed over complex continuous landscapes. In this work, we evaluate plant community profiles produced via metabarcoding of soil by comparing them to a morphological survey. We assessed plant communities by metabarcoding of soil DNA in 130 sites along ecological gradients (nutrients, succession, moisture) in Denmark using chloroplast trnL region (10-143 bp) primer set and compared the resulting communities to communities produced with a longer nuclear ITS2 region (~216 bp) and a morphological survey. We found that the community variation observed within the morphological survey was well represented by molecular surveys, with significant correlation with both community composition and richness using both primer sets. While the majority of the ITS2 sequences could be assigned to species (over 80%), we had less success with the trnL sequences (70%), which was only possible after restricting the reference database to local species. We conclude that the community profiles produced by metabarcoding can be highly effective in performing large-scale macroecological studies. However, the discovery rates and taxonomic assignments produced via metabarcoding remained inferior to morphological surveys, but manual curation of databases improves the specificity of assignments made by the trnL primers, and improves the accuracy of the assignments made with the ITS2 primers. Finally, we suggest that a greater percentage of named diversity would be recovered by increasing soil sampling with the use of additional universal primer sets. Methods Sampling was performed across 130 sites (40 m x 40 m) in Denmark. For this study we generated new sequence data for trnL from existing DNA extracts, and used already published sequence data for ITS2 (from the same DNA extracts) and combined with published survey data for plants from the same study sites. Detailed materials and methods can be read in detail in the associated publication and in Frøslev et al (2017) and Brunbjerg et al (2019). This repository holds the following material: TRNL SEQUENCE DATA: A) trnl_fastq.tar.gz – Sequence data. Raw tRNL sequencing data from MiSeq. 6 sequencing libraries (R1 + R2), with multiplexed primers, approximately 67-71 samples (PCR products) per library. B) trnl_taglists.zip – PCR tagging. One file per library with tag pairs used. Each tag is a 6 bp oligo preceding the primer. C) trnl_replicate_Info.csv – PCR replicates. One file with PCR numbers (S001 and up) and corresponding sample numbers (like SN081). Each of the 130 samples were amplified in three PCR replicates ITS2 SEQUENCE DATA: D) Raw its2 sequence data can be downloaded here: https://doi.org/10.5061/dryad.n9077 E) trnl_taglists.zip – PCR tagging. One file per library with tag pairs used. Each tag is a 6 bp oligo preceding the primer. F) Its2_replicate_Info.csv - PCR replicates. One file with PCR numbers (S001 and up) and corresponding sample numbers (like SN081). Each of the 130 samples were amplified in three PCR replicates Processed data: G) community_tables.zip - Taxonomically annotated tables with used for the analyses. 7 tables in rds format. (read in r with function readRDS). For tRNL and ITS2 there are tables annotated with local, regional and global reference databases, respectively. Each table contains: read counts for each OTU in each of the 130 sites; OTU_ID = a sha1 hash of the sequence; sequence = DNA sequence of the OTU; seq_len = length of the sequence; pident = match with reference sequence; and taxonomic affiliation at 6 levels, and a field indicating if the annotation is taxonomically redundant. And a table with the inventory (survey) data. References: Frøslev, T.G., R. Kjøller, H.H. Bruun, R. Ejrnæs, A.K. Brunbjerg, C. Pietroni and A.J. Hansen. 2017. Algorithm for post-clustering curation of DNA amplicon data yields reliable biodiversity estimates. Nature communications 8(1): 1–11. Brunbjerg, A. K., H.H. Bruun, K. Brøndum, A.T. Classen, L. Dalby, K. Fog, T.G. Frøslev, I. Goldberg, et al. 2019. A systematic survey of regional multi-taxon biodiversity: evaluating strategies and coverage. BMC ecology, 19(1): 1–15.

尽管针对环境中植物DNA的元条形码（metabarcoding）技术是一种可补充活体植物物种清查工作的极具前景的方法，但其在复杂连续景观中的准确性与特异性仍有待全面评估。本研究通过与形态学调查结果对比，评估基于土壤元条形码技术得到的植物群落剖面。我们针对丹麦境内沿生态梯度（营养水平、演替阶段、湿度）分布的130个采样点，利用叶绿体trnL区域（10~143 bp）引物组对土壤DNA进行元条形码测序，以评估植物群落组成，并将所得群落结果与基于更长片段的细胞核ITS2区域（~216 bp）测序结果以及形态学调查结果进行对比。研究发现，形态学调查所观测到的群落变异可通过分子测序调查较好地复现，两种引物组所得结果均与群落组成及物种丰富度呈显著相关。尽管大部分ITS2序列可被注释到物种水平（占比超80%），但trnL序列的注释成功率仅为70%，且仅当将参考数据库限定为本地物种时才可实现该效果。我们得出结论：元条形码技术所得的群落剖面可高效应用于大尺度宏观生态学研究。不过，元条形码技术的物种发现率与分类学注释效果仍劣于形态学调查，但通过对数据库进行手动整理，可提升trnL引物的注释特异性，并改善ITS2引物的注释准确性。最后，我们建议通过增加土壤采样量并使用更多通用引物组，可获取更高比例的已命名物种多样性。方法采样工作覆盖丹麦境内130个（40 m × 40 m）采样点。本研究利用现有DNA提取物生成了trnL的新序列数据，并使用已发表的ITS2序列数据（源自相同DNA提取物），结合同一样点的已发表植物调查数据开展分析。详细的材料与方法可参阅相关发表论文，以及Frøslev等人（2017）与Brunbjerg等人（2019）的研究。本数据集仓库包含以下材料： trnL序列数据： A) trnl_fastq.tar.gz —— 序列数据：来自MiSeq平台的原始trnL测序数据，包含6个测序文库（R1+R2），采用多重引物标记，每个文库约包含67~71个样本（PCR产物）。 B) trnl_taglists.zip —— PCR标记信息：每个文库对应一个文件，记录所用的标记引物对；每个标记为引物上游的6bp寡核苷酸序列。 C) trnl_replicate_Info.csv —— PCR重复信息：包含PCR编号（S001及以上）与对应样本编号（如SN081）的表格；130个样本每个均设置3次PCR重复。 ITS2序列数据： D) 原始ITS2序列数据可通过以下链接下载：https://doi.org/10.5061/dryad.n9077 E) trnl_taglists.zip —— PCR标记信息：每个文库对应一个文件，记录所用的标记引物对；每个标记为引物上游的6bp寡核苷酸序列。 F) its2_replicate_Info.csv —— PCR重复信息：包含PCR编号（S001及以上）与对应样本编号（如SN081）的表格；130个样本每个均设置3次PCR重复。处理后数据： G) community_tables.zip —— 用于分析的带分类学注释的表格文件，包含7个rds格式的表格（可通过R语言的readRDS函数读取）。其中针对trnL和ITS2分别设置了基于本地、区域和全球参考数据库注释的表格。每张表格包含以下内容：130个采样点中每个操作分类单元（Operational Taxonomic Unit, OTU）的读长计数；OTU_ID=序列的SHA1哈希值；sequence=OTU的DNA序列；seq_len=序列长度；pident=与参考序列的匹配相似度；以及6个层级的分类学归属，还有一个字段用于标注该注释是否存在分类学冗余。此外还包含一份物种清查（调查）数据表格。参考文献： Frøslev, T.G., R. Kjøller, H.H. Bruun, R. Ejrnæs, A.K. Brunbjerg, C. Pietroni及A.J. Hansen. 2017. DNA扩增子数据聚类后整理算法可生成可靠的生物多样性估算结果. 《自然·通讯》, 8(1): 1–11. Brunbjerg, A.K., H.H. Bruun, K. Brøndum, A.T. Classen, L. Dalby, K. Fog, T.G. Frøslev, I. Goldberg等. 2019. 区域多类群生物多样性系统性调查：策略与覆盖度评估. 《BMC生态学》, 19(1): 1–15.

创建时间：

2022-02-25

5,000+

优质数据集

54 个

任务类型

进入经典数据集