five

Do pseudogenes pose a problem for metabarcoding marine animal communities?

收藏
DataONE2023-08-18 更新2024-06-08 收录
下载链接:
https://search.dataone.org/view/sha256:5935f53f99a58ce1b71ea50decbda2a9ee0ad8fe4ba3bd9fbe6ce51d1ae7b3d7
下载链接
链接失效反馈
官方服务:
资源简介:
Because DNA metabarcoding typically employs sequence diversity among mitochondrial amplicons to estimate species composition, nuclear mitochondrial pseudogenes (NUMTs) can inflate diversity. This study quantifies the incidence and attributes of NUMTs derived from the 658 bp barcode region of cytochrome c oxidase I (COI) in 156 marine animal genomes. NUMTs were examined to ascertain if they could be recognized by their possession of indels or stop codons. In total, 309 NUMTs  150 bp were detected, with an average of 1.98 per species (range = 0–33) and a mean length of 391 bp  200 bp. Among this total, 75 (23.4%) lacked indels or stop codons. NUMTs appear to pose the greatest interpretational risk when short (< 313 bp) amplicons are used, such as in eDNA studies, dietary analyses, or processed fish identification. Employing the standard amplicon length (313 bp) for marine metabarcoding, NUMTs could potentially inflate the OTU count by 21% above the true species count while also raisi..., Data collection We examined the incidence of COI NUMTs in the genomes of marine animals on the NCBI genome browser (Clark et al. 2016). To identify candidate genomes, we compared taxonomic names in the World Register of Marine Species (WoRMS; Horton et al. 2020) with the NCBI genome browser (https://www.ncbi.nlm.nih.gov/genome/browse). All genomes for marine invertebrates were downloaded together with those for at least one species per order of marine vertebrates, selected haphazardly. When more than one genome was available for a species, the reference genome (if available) or the most recent assembly was selected. In addition, we downloaded the COI sequence from the mitochondrial genome of each species and used AliView (Larson 2014) to extract the 658 bp recovered by primers targeting the barcode region (Hebert et al. 2003). When available, the reference sequence for the full COI gene was also retained. When a COI sequence was unavailable on GenBank, the Barcode of Life Database (BOLD..., The data used for analysis are contained in two .csv files, \"Table S1 - List of COI NUMTs in sequenced marine genomes.csv\" and \"Table S2 - Summary of COI NUMTs by species.csv.\"  Table S1 is a table of all of the COI hits found in each genome, including the datafields exported by Geneious Prime.  Table S2 contains summary information for each organism included in the analysis, including GenBank accession numbers or Barcode Index Numbers, as applicable, for all genomes and COI query sequences employed in our analysis. A description of the data columns in each file are as follows. Table S1 - List of COI NUMTs in sequenced marine genomes.csv •    Name – identifier for the hit •    #.Nucleotides - number of nucleotides in the BLASTn hit •    #.Sequences – number of sequences aligned in the BLASTn search  •    %.Identical.Sites – percent of identical sites •    %.Pairwise.Identity – perrcent of pairwise ID •    GC – GC content in percent •    Bit-Score – bit score •    Created.Date – date and...

由于DNA元条形码(DNA metabarcoding)通常借助线粒体扩增子间的序列多样性来评估物种组成,核线粒体假基因(nuclear mitochondrial pseudogenes, NUMTs)会导致多样性评估结果虚高。本研究对156个海洋动物基因组中源自细胞色素c氧化酶I(cytochrome c oxidase I, COI)658 bp条形码区域的NUMTs的发生率与特征进行了定量分析。本研究通过检测NUMTs是否存在插入缺失(indels)或终止密码子,以判断其是否可被识别。最终共检测到309条长度≥150 bp的NUMTs,平均每个物种检出1.98条(范围0~33),平均长度为391 bp±200 bp。其中75条(23.4%)未携带插入缺失或终止密码子。当使用较短(<313 bp)的扩增子时,NUMTs会带来最大的解读风险,这类场景常见于环境DNA(eDNA)研究、饮食分析或加工鱼类物种鉴定中。若采用适用于海洋元条形码的标准扩增子长度(313 bp),NUMTs理论上可使操作分类单元(OTU)数量较真实物种数高出21%,同时…… ### 数据收集 我们通过NCBI基因组浏览器(Clark等,2016)查询海洋动物基因组中的COI NUMTs发生率。为筛选候选基因组,我们将《世界海洋物种登记册》(World Register of Marine Species, WoRMS; Horton等,2020)中的物种分类名称与NCBI基因组浏览器(https://www.ncbi.nlm.nih.gov/genome/browse)进行比对。下载所有海洋无脊椎动物基因组,以及每个海洋脊椎动物纲至少1个随机选取的物种的基因组。若某一物种存在多个基因组,则优先选择参考基因组(若存在)或最新组装版本。此外,我们从每个物种的线粒体基因组中下载COI序列,并使用AliView(Larson,2014)提取以条形码区域特异性引物扩增得到的658 bp片段(Hebert等,2003)。若可获取完整COI基因的参考序列,也一并保留。若GenBank中无该物种的COI序列,则使用生命条形码数据库(Barcode of Life Database, BOLD)中的序列…… 本研究用于分析的数据包含两个逗号分隔值(.csv)文件:"Table S1 - List of COI NUMTs in sequenced marine genomes.csv"与"Table S2 - Summary of COI NUMTs by species.csv"。其中,表S1记录了各基因组中检测到的所有COI比对结果,包含Geneious Prime导出的所有数据字段;表S2汇总了本研究纳入的每个生物体的相关信息,包括所有使用的基因组与COI查询序列的GenBank登录号,或适用的条形码索引编号(Barcode Index Numbers)。以下为两个文件中各数据列的说明: #### Table S1 - List of COI NUMTs in sequenced marine genomes.csv • Name:比对结果的标识符 • #.Nucleotides:BLASTn比对中匹配的核苷酸数量 • #.Sequences:BLASTn搜索中比对的序列数量 • %.Identical.Sites:同源位点百分比 • %.Pairwise.Identity:两两比对同一性百分比 • GC:GC含量(百分比) • Bit-Score:比特分数 • Created.Date:创建日期及……
创建时间:
2023-11-29
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作